InfoVision - Irving, TX

posted about 2 months ago

Full-time
Irving, TX
Professional, Scientific, and Technical Services

About the position

We are seeking a skilled Data Engineer with extensive experience in big data processing technologies and data architecture. The ideal candidate will have a strong background in working with various big data platforms such as Cloudera, Horton Works, Snowflake, AWS EMR, RedShift, and AWS Glue. You will be responsible for designing and implementing data pipelines, ensuring the efficient processing and storage of large datasets, and optimizing data workflows to support analytics and reporting needs. In this role, you will work closely with cross-functional teams to understand data requirements and translate them into technical specifications. You will leverage your expertise in Hadoop, Apache Spark, Pyspark, and other big data technologies to build robust data solutions. Your responsibilities will also include maintaining data warehouse technical architecture and infrastructure components, as well as utilizing reporting and analytic tools to derive insights from data. The successful candidate will have experience in building automated ETL processes and data pipelines, ensuring data quality and integrity throughout the data lifecycle. You will also be expected to utilize scripting languages such as Python, Scala, or Shell Scripting to automate tasks and improve efficiency. Familiarity with build tools, version control, unit testing, monitoring, and change management practices to support DevOps is essential for this position.

Responsibilities

  • Design and implement data pipelines for processing large datasets.
  • Optimize data workflows to support analytics and reporting needs.
  • Collaborate with cross-functional teams to gather data requirements.
  • Maintain data warehouse technical architecture and infrastructure components.
  • Utilize big data technologies such as Hadoop, Apache Spark, and Pyspark.
  • Build automated ETL processes to ensure data quality and integrity.
  • Leverage reporting and analytic tools to derive insights from data.
  • Utilize scripting languages for automation and efficiency improvements.
  • Support DevOps practices including build tools, version control, and monitoring.

Requirements

  • Experience with big data processing technologies (Hadoop, Apache Spark, Pyspark, Python, Scala, HDFS, Hive, Impala).
  • Proficiency in data warehouse technical architecture and infrastructure components.
  • Experience with big data platforms (Cloudera, Horton Works, Snowflake, AWS EMR, RedShift, AWS Glue).
  • Experience with database, data warehouse, or data lake solutions.
  • Proven experience in building data pipelines or automated ETL processes.
  • Familiarity with one or more scripting languages (Python, Scala, Shell Scripting).
  • Experience with build tools, version control, unit testing, monitoring, and change management.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service