Newt Global - Dallas, TX

posted 2 months ago

Full-time - Mid Level
Dallas, TX

About the position

We are seeking an experienced AWS Data Engineer with a strong background in Snowflake and banking experience to join our team on a W2 contract basis. The ideal candidate will have over 10 years of total IT experience, with at least 5 years dedicated to Hadoop and big data technologies. This role requires advanced knowledge of the Hadoop ecosystem, including hands-on experience with HDFS, MapReduce, Hive, Pig, Impala, Spark, Kafka, Kudu, and Solr. The successful candidate will be responsible for designing and developing data pipelines for data ingestion and transformation using Scala or Python, with a strong emphasis on Spark programming, particularly PySpark. In this position, you will leverage your expertise in building pipelines using Apache Spark and your familiarity with core AWS provider services. A hands-on approach to Python and PySpark, along with basic libraries for machine learning, is essential. Additionally, exposure to containerization technologies such as Docker and Kubernetes, as well as aspects of DevOps including source control, continuous integration, and deployments, will be beneficial. The role requires a system-level understanding of data structures, algorithms, and distributed storage and compute. We are looking for a candidate with a can-do attitude who excels at solving complex business problems and possesses strong interpersonal and teamwork skills. Team management experience is crucial, as you will be leading a team of data engineers and analysts. Experience with Snowflake is a plus, and a Bachelor’s degree or equivalent is required.

Responsibilities

  • Design and develop data pipelines for data ingestion and transformation using Scala or Python.
  • Build and maintain data processing pipelines using Apache Spark, particularly PySpark.
  • Collaborate with team members to solve complex business problems and enhance data processing capabilities.
  • Manage a team of data engineers and analysts, providing guidance and support.
  • Utilize AWS core provider services to optimize data workflows and storage solutions.
  • Implement containerization technologies such as Docker and Kubernetes in data engineering processes.
  • Engage in DevOps practices including source control, continuous integration, and deployment processes.

Requirements

  • Over 10 years of total IT experience.
  • 5+ years of experience with Hadoop (Cloudera) and big data technologies.
  • Advanced knowledge of the Hadoop ecosystem and Big Data technologies.
  • Hands-on experience with HDFS, MapReduce, Hive, Pig, Impala, Spark, Kafka, Kudu, and Solr.
  • Experience designing and developing data pipelines using Scala or Python.
  • Expert level in building pipelines using Apache Spark.
  • Familiarity with core AWS provider services.
  • Hands-on experience with Python/PySpark and basic libraries for machine learning.
  • Exposure to containerization technologies (e.g., Docker, Kubernetes).
  • Exposure to DevOps practices (source control, continuous integration, deployments, etc.).
  • Proficient in programming in Python with prior Apache Beam/Spark experience a plus.
  • System level understanding of data structures, algorithms, distributed storage, and compute.
  • Strong interpersonal and teamwork skills.

Nice-to-haves

  • Experience in Snowflake is a plus.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service