NTT DATA - Frisco, TX

posted about 2 months ago

Full-time - Mid Level
Remote - Frisco, TX
10,001+ employees
Professional, Scientific, and Technical Services

About the position

As a QA Data Engineer at NTT DATA, you will play a crucial role in ensuring the quality and reliability of our data pipelines within an agile development environment. Your primary responsibilities will include designing, developing, and maintaining comprehensive test plans and test cases tailored for scalable data pipelines. You will implement automated testing solutions utilizing Python and PySpark on our cloud-native Lakehouse data platform, ensuring that our data processes are efficient and robust. Writing efficient SQL queries will be essential for validating data extraction, transformation, and loading processes, which are critical to maintaining data integrity. Collaboration is key in this role, as you will work closely with product management and analysts to understand data requirements and ensure that quality assurance standards are met. You will also be tasked with optimizing and troubleshooting data pipelines to enhance performance and reliability. Following DevOps principles, you will utilize CI/CD practices to deploy and operate automated testing frameworks, contributing to a culture of continuous improvement and innovation. Your expertise in quality assurance will be vital in ensuring data quality and integrity through comprehensive testing and validation processes. This position offers the opportunity to work in a dynamic environment where your contributions will directly impact the success of our data initiatives and the overall business objectives of NTT DATA.

Responsibilities

  • Design, develop, and maintain comprehensive test plans and test cases for scalable data pipelines.
  • Implement automated testing solutions using Python and PySpark on a cloud-native Lakehouse data platform.
  • Write efficient SQL queries to validate data extraction, transformation, and loading processes.
  • Collaborate with product management and analysts to understand data requirements and ensure quality assurance.
  • Optimize and troubleshoot data pipelines for performance and reliability.
  • Ensure data quality and integrity through comprehensive testing and validation processes.
  • Follow DevOps principles and use CI/CD to deploy and operate automated testing frameworks.

Requirements

  • Proficiency in Python and PySpark.
  • Strong experience with SQL and database management.
  • Knowledge of software development patterns and best practices in quality assurance.
  • Experience with ETL/ELT processes and data pipeline testing.
  • Proficiency developing using version control, automated testing, and deployments using git-based tools like GitHub and GitHub Actions.

Nice-to-haves

  • Understanding of testing methodologies for data pipelines, including unit testing, integration testing, and end-to-end testing.
  • Knowledge of data governance and data security best practices.
  • Familiarity with data warehousing concepts and tools.
  • Experience with cloud platforms (e.g., Azure, AWS, GCP) with Azure preferred.
  • Knowledge of big data technologies (e.g., Microsoft Fabric, Azure Synapse, Lakehouse, Databricks).
  • Familiarity with advanced data orchestration tooling and development frameworks like dbt or Airflow.
  • Experience working in a healthcare-related industry.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service