Python and PySpark Data Engineer - Remote

NTT DATA - Frisco, TX

posted about 2 months ago

Full-time - Mid Level

Remote - Frisco, TX

10,001+ employees

Professional, Scientific, and Technical Services

About the position

The Python and PySpark Data Engineer position at NTT DATA involves designing, developing, and maintaining scalable data pipelines in a cloud-native Lakehouse data platform. The role requires collaboration with product management and analysts to understand data requirements, ensuring data quality and integrity through testing and validation processes, and following DevOps principles for deployment and operation of data pipelines.

Responsibilities

Design, develop, and maintain scalable data pipelines using software development patterns.
Implement data processing solutions using Python and PySpark on a cloud-native Lakehouse data platform.
Write efficient SQL queries to extract, transform, and load data.
Collaborate with product management and analysts to understand data requirements and deliver solutions.
Optimize and troubleshoot data pipelines for performance and reliability.
Ensure data quality and integrity through comprehensive testing and validation processes.
Follow DevOps principles and use CI/CD to deploy and operate data pipelines.

Requirements

Proficiency in Python and PySpark.
Strong experience with SQL and database management.
Knowledge of software development patterns and best practices in data engineering.
Experience with ETL/ELT processes and data pipeline orchestration.
Proficiency developing using version control, automated testing, and deployments using git-based tools like GitHub and GitHub Actions.

Nice-to-haves

Understanding of testing methodologies for data pipelines, including unit testing, integration testing, and end-to-end testing.
Knowledge of data governance and data security best practices.
Familiarity with data warehousing concepts and tools.
Experience with cloud platforms (e.g., Azure, AWS, GCP) with Azure preferred.
Knowledge of big data technologies (e.g., Microsoft Fabric, Azure Synapse, Lakehouse, Databricks).
Familiarity with advanced data orchestration tooling and development frameworks like dbt or Airflow.
Experience working in a healthcare related industry.

Match and compare your resume to any job description

Start Matching

Python and PySpark Data Engineer - Remote

About the position

Responsibilities

Requirements

Nice-to-haves

Tools

Career Hubs

Guides

Company