Lead Data Engineer - ML/GCP

$118,450 - $236,900/Yr

CVS Health - Irving, TX

posted 3 months ago

Full-time - Mid Level
Irving, TX
Health and Personal Care Retailers

About the position

As a Lead Data Engineer specializing in Machine Learning and Google Cloud Platform (GCP), you will serve as a technical leader, guiding a team of data engineers while collaborating closely with data scientists and analysts. Your primary focus will be to support data-driven decision-making by analyzing complex data structures from various sources and designing large-scale data engineering pipelines. You will be responsible for developing extensive data structures and pipelines that organize, collect, and standardize data, ultimately generating insights and addressing reporting needs. In this role, you will implement data ingestion pipelines using APIs, third-party tools, or custom code to ingest high volumes of data into the cloud environment. You will write processes, design database systems, and develop tools for both real-time and offline analytic processing. Collaboration with product business and data science teams will be essential as you collect user stories, translate them into technical specifications, and implement data transformations, algorithms, and models into automated processes. Your strong programming skills in PySpark, Python, Java, or other major languages will be crucial in building robust data pipelines and dynamic systems. You will also build highly scalable and extensible data marts and data models to support Data Science and other internal customers on the cloud, ensuring that data from various sources adheres to quality and accessibility standards. Additionally, you will analyze current IT environments to identify critical capabilities and recommend solutions, facilitating machine learning across large-scale systems and campaigns in partnership with data engineering. The ideal candidate will be detail-oriented, capable of quickly understanding complex situations, managing multiple urgent tasks, and communicating openly to build trust and respect. You will work with data related to a wide range of customer interactions and analytics, designing and deploying large-scale ML models with support from the data engineering and product teams. Experimentation with available tools and advising on new tools will be part of your responsibilities to determine optimal solutions based on model requirements.

Responsibilities

  • Provide guidance to a team of data engineers and collaborate with data scientists and analysts.
  • Analyze complex data structures from disparate data sources and design large scale data engineering pipelines.
  • Develop large scale data structures and pipelines to organize, collect, and standardize data for insights and reporting needs.
  • Implement data ingestion pipelines using APIs, third-party tools, or custom code to ingest high volume data into the cloud environment.
  • Write processes, design database systems, and develop tools for real-time and offline analytic processing.
  • Collaborate with product business and data science teams to collect user stories and translate them into technical specifications.
  • Implement data transformation, algorithms, and models into automated processes.
  • Use programming skills in PySpark, Python, Java, or other major languages to build robust data pipelines and dynamic systems.
  • Build highly scalable and extensible data marts and data models to support Data Science and other internal customers on the cloud.
  • Integrate data from various sources, ensuring adherence to data quality and accessibility standards.
  • Analyze current IT environments to identify critical capabilities and recommend solutions.
  • Facilitate machine learning across large scale systems and campaigns, ensuring deployment and updates in partnership with data engineering.
  • Develop and participate in presentations and consultations on analytics results and solutions.
  • Interact with internal and external peers and managers to exchange complex information related to areas of specialization.

Requirements

  • 7+ years of progressively complex related experience in cloud data engineering and data analysis.
  • 7+ years of experience in Data Engineering, Analytics, and Machine Learning Systems.
  • 3+ years of building cloud-native analytical products in GCP, Azure, or AWS.
  • Sound knowledge in cloud technology, preferably Google Cloud Platform (GCP).
  • Deep knowledge of large scale distributed data architecture and performance optimization techniques.
  • Proficiency in developing complex data pipelines, ETLs, and workflows on cloud platforms optimized for high volume healthcare data.
  • Proficiency in using cloud platforms such as GCP/Azure/AWS, and tools like Composer, Kafka, PySpark, and SQL.
  • Knowledge of programming languages Java and Python.
  • Proficiency with CI/CD tooling like Jenkins and GitHub to enable robust development pipelines for data and ML.
  • Strong knowledge of large-scale search applications and building high volume data pipelines, preferably using PySpark on GCP and its native tools such as BigQuery, Airflow, Composer, DataProc, PUB/SUB, DataFlow, and Vertex AI.
  • Strong foundational knowledge in Agile methodologies.

Nice-to-haves

  • Experience with the healthcare domain is highly desirable.
  • Deep understanding of data warehousing, data architecture, and data modeling methods & best practices.
  • Understanding of AI/ML technology stack.
  • Comfortable working experience with large scale LLMs.
  • Exposure in implementing Gen AI and/or NLP based solutions using LLMs.
  • Sound experience in Google Cloud Data Services: Big Query, Data Proc, PubSub, Cloud Functions, Cloud Storage, Dataflow, Composer.

Benefits

  • Full range of medical, dental, and vision benefits.
  • 401(k) retirement savings plan.
  • Employee Stock Purchase Plan for eligible employees.
  • Fully-paid term life insurance plan for eligible employees.
  • Short-term and long-term disability benefits.
  • Numerous well-being programs.
  • Education assistance and free development courses.
  • CVS store discount and discount programs with participating partners.
  • Paid Time Off (PTO) or vacation pay, as well as paid holidays throughout the calendar year.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service