Vdart - Dallas, TX

posted 3 months ago

Full-time - Mid Level
Remote - Dallas, TX
1,001-5,000 employees
Professional, Scientific, and Technical Services

About the position

As a Python/PySpark Developer, you will play a crucial role in designing, developing, and deploying scalable data processing pipelines that leverage the power of Python and PySpark. This position is fully remote, allowing you to work from anywhere while contributing to innovative data solutions. You will collaborate closely with data scientists and analysts to gather requirements and translate them into effective technical solutions. Your expertise in writing efficient, optimized, and secure code will be essential for processing, transforming, and analyzing large volumes of data effectively. In this role, you will implement data ingestion processes from various data sources into the data processing platform, ensuring that data flows seamlessly through the system. You will be responsible for creating and maintaining data pipelines and workflows that support data processing and analytics. Performing data quality checks will be a key part of your responsibilities, as you will need to ensure data integrity throughout the system. Additionally, you will troubleshoot and debug production issues, identifying and resolving technical problems as they arise. Collaboration is vital in this position, as you will work with cross-functional teams to ensure that data processing applications integrate seamlessly with other systems. Your contributions will directly impact the efficiency and effectiveness of data-driven decision-making within the organization.

Responsibilities

  • Design, develop, and deploy scalable data processing pipelines using Python and PySpark.
  • Collaborate with data scientists and analysts to understand requirements and translate them into technical solutions.
  • Write efficient, optimized, and secure code to process, transform, and analyze large volumes of data.
  • Implement data ingestion processes from various data sources to the data processing platform.
  • Create and maintain data pipelines and workflows for data processing and analytics.
  • Perform data quality checks and ensure data integrity throughout the system.
  • Troubleshoot and debug production issues to identify and resolve technical problems.
  • Collaborate with cross-functional teams to ensure seamless integration of data processing applications with other systems.

Requirements

  • Bachelor's degree in computer science, engineering, or a related field.
  • 5+ years of experience in Python development with specific experience in PySpark and Apache Spark.
  • 5+ years of experience as a software developer.
  • Strong to expert proficiency in Python programming language.
  • Expertise in writing and optimizing SQL queries and scripts.
  • Experience with distributed computing frameworks like PySpark and Apache Spark.
  • Hands-on experience using Microsoft Fabric in at least one project.
  • Knowledge of data processing and analytics techniques.
  • Experience with data integration and ETL processes.
  • Familiarity with data storage and querying systems like SQL and NoSQL databases.
  • Understanding of data structures, algorithms, and distributed systems.
  • Excellent problem-solving and analytical skills.
  • Experience in Healthcare and Health Plan domains and ability to demonstrate domain knowledge to enhance data solutions.
  • Hands-on experience with FHIR.
  • Technical or Team Lead experience is a plus.

Nice-to-haves

  • Technical or Team Lead experience is a plus.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service