HouseWorks Home Care - Woburn, MA

posted 15 days ago

Full-time - Mid Level
Woburn, MA

About the position

HouseWorks is seeking an ETL Developer to join our data engineering team. This role involves designing, developing, and maintaining scalable ETL pipelines using Apache Spark and Python within the AWS ecosystem. The ETL Developer will collaborate with data analysts and scientists to ensure data availability and reliability for analytical purposes, contributing to our mission of improving the lives of seniors through innovative home care solutions.

Responsibilities

  • Design, develop, and implement ETL workflows, data models, and pipelines using PySpark.
  • Extract data from multiple sources, transform it for consistency, and load it into data lake/repository.
  • Optimize PySpark scripts for performance, scalability, and efficiency, ensuring minimal resource consumption.
  • Ensure data accuracy and reliability through quality checks, error handling, and validations within ETL pipelines.
  • Monitor, troubleshoot, and enhance ETL pipeline performance.
  • Automate ETL processes and integrate with scheduling tools like Apache Airflow, AWS Eventbridge, or similar.
  • Manage and optimize data storage, compute resources, and security configurations on cloud platforms (e.g., AWS, Azure, GCP).
  • Develop and maintain technical documentation for ETL processes, workflows, and data definitions.
  • Participate in code reviews, provide feedback, and collaborate within an agile development environment.

Requirements

  • 3+ years of experience in designing and developing ETL pipelines using PySpark.
  • Bachelor's degree in computer science, Data Analytics, IT or related field, strongly preferred.
  • Strong proficiency with Apache Spark and the Python programming language.
  • Experience working with distributed data processing systems and big data technologies.
  • Solid understanding of SQL and experience working with relational and non-relational databases (e.g., PostgreSQL, MySQL, MongoDB).
  • Experience with cloud platforms such as AWS (e.g., S3, EMR), Azure, or Google Cloud Platform.
  • Familiarity with workflow orchestration tools and processes.
  • Strong experience with data modeling, ETL best practices, and handling large-scale data transformations.
  • Hands-on experience in optimizing and tuning PySpark jobs for better performance and cost efficiency.
  • Familiarity with version control systems such as Git.
  • Understanding of data governance, data quality, and data security principles.
  • Strong problem-solving and analytical skills, with the ability to troubleshoot and identify root causes of data issues.
  • Knowledge of Kafka, AWS Glue, Databricks, or other real-time and cloud data integration tools.
  • Understanding of data warehousing concepts and experience with data warehouse solutions such as Snowflake, Redshift, or BigQuery.
  • Strong interpersonal skills with the ability to communicate effectively with both technical and non-technical stakeholders.

Benefits

  • 401k
  • Medical, Vision & Dental Insurance
  • PTO, Sick Time, Floating Holidays
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service