Iic - Dallas, TX

posted 4 months ago

Full-time - Mid Level
Dallas, TX
Insurance Carriers and Related Activities

About the position

The ETL/Hadoop Developer position is a contract role based in Dallas, TX, or Charlotte, NC, with a duration of 6+ months. The ideal candidate will be an experienced ETL Developer with proficiency in Hadoop ecosystem tools such as PySpark, Hive, and Sqoop, and strong programming skills in Python. This role requires a solid understanding of data warehousing concepts and hands-on experience in designing, developing, and maintaining ETL processes for large-scale data sets. The candidate will be responsible for implementing ETL processes that extract, transform, and load data from various sources into the data warehouse, ensuring data quality and compliance with governance policies. The successful candidate will collaborate with data architects, data engineers, and other stakeholders to understand data requirements and implement effective solutions. They will optimize ETL workflows for performance and scalability, write complex SQL queries for data transformation, and develop automation scripts for scheduling and monitoring ETL jobs. Troubleshooting and resolving issues related to data consistency and performance tuning will also be key responsibilities. Documentation of ETL processes, data mappings, and data dictionaries will be essential for reference and reporting purposes.

Responsibilities

  • Design, develop, and implement ETL processes using Hadoop ecosystem tools (PySpark, Hive, Sqoop) to extract, transform, and load data from various sources to the data warehouse.
  • Collaborate with data architects, data engineers, and other stakeholders to understand data requirements and implement solutions accordingly.
  • Optimize ETL workflows for performance and scalability, ensuring efficient data processing across the Hadoop cluster.
  • Write complex SQL queries and scripts to transform and cleanse data as per business requirements.
  • Develop and maintain automation scripts using Python for scheduling and monitoring ETL jobs.
  • Troubleshoot and resolve issues related to data consistency, data quality, and performance tuning.
  • Ensure compliance with data governance and security policies throughout the ETL process.
  • Document ETL processes, data mappings, and data dictionaries for reference and reporting purposes.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Proven experience as an ETL Developer or similar role with a focus on Hadoop ecosystem tools.
  • Strong proficiency in PySpark for data processing and Hive for data warehousing.
  • Hands-on experience with Sqoop for data ingestion between Hadoop and relational databases.
  • Advanced programming skills in Python for automation and scripting.
  • Familiarity with workflow scheduling tools like AutoSys for job automation and monitoring.
  • Experience with version control systems such as Bitbucket for managing codebase.
  • Knowledge of deployment automation tools like XLR for managing ETL deployments.
  • Strong analytical and problem-solving skills with a keen attention to detail.
  • Excellent communication and collaboration skills to work effectively within a team environment.
  • Ability to adapt to new technologies and learn quickly in a fast-paced environment.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service