QA DATA Engineer - Remote

$84,000 - $142,000/Yr

NTT DATA - Frisco, TX

posted about 2 months ago

Full-time - Mid Level
Onsite - Frisco, TX
10,001+ employees
Professional, Scientific, and Technical Services

About the position

NTT DATA is seeking a QA DATA Engineer to join our team remotely in Frisco, Texas. This role is pivotal in ensuring the quality and reliability of our data pipelines within an agile development environment. The successful candidate will be responsible for designing, developing, and maintaining comprehensive test plans and test cases that cater to scalable data pipelines. This includes implementing automated testing solutions using Python and PySpark on a cloud-native Lakehouse data platform, as well as writing efficient SQL queries to validate data extraction, transformation, and loading processes. Collaboration is key in this role, as you will work closely with product management and analysts to understand data requirements and ensure quality assurance throughout the development lifecycle. You will also be tasked with optimizing and troubleshooting data pipelines to enhance performance and reliability, ensuring data quality and integrity through rigorous testing and validation processes. Following DevOps principles, you will utilize CI/CD practices to deploy and operate automated testing frameworks, contributing to a culture of continuous improvement and innovation. The ideal candidate will possess a strong background in quality assurance, with a focus on data engineering. You will need to demonstrate proficiency in Python and PySpark, along with a solid understanding of SQL and database management. Familiarity with software development patterns and best practices in quality assurance is essential, as is experience with ETL/ELT processes and data pipeline testing. Your ability to develop using version control and automated testing tools, particularly git-based tools like GitHub and GitHub Actions, will be crucial to your success in this role.

Responsibilities

  • Design, develop, and maintain comprehensive test plans and test cases for scalable data pipelines.
  • Implement automated testing solutions using Python and PySpark on a cloud-native Lakehouse data platform.
  • Write efficient SQL queries to validate data extraction, transformation, and loading processes.
  • Collaborate with product management and analysts to understand data requirements and ensure quality assurance.
  • Optimize and troubleshoot data pipelines for performance and reliability.
  • Ensure data quality and integrity through comprehensive testing and validation processes.
  • Follow DevOps principles and use CI/CD to deploy and operate automated testing frameworks.

Requirements

  • Proficiency in Python and PySpark.
  • Strong experience with SQL and database management.
  • Knowledge of software development patterns and best practices in quality assurance.
  • Experience with ETL/ELT processes and data pipeline testing.
  • Proficiency developing using version control, automated testing, and deployments using git-based tools like GitHub and GitHub Actions.

Nice-to-haves

  • Understanding of testing methodologies for data pipelines, including unit testing, integration testing, and end-to-end testing.
  • Knowledge of data governance and data security best practices.
  • Familiarity with data warehousing concepts and tools.
  • Experience with cloud platforms (e.g., Azure, AWS, GCP) with Azure preferred.
  • Knowledge of big data technologies (e.g., Microsoft Fabric, Azure Synapse, Lakehouse, Databricks).
  • Familiarity with advanced data orchestration tooling and development frameworks like dbt or Airflow.
  • Experience working in a healthcare-related industry.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service