CVS Health - Irving, TX
posted 3 months ago
As a Lead Data Engineer specializing in Machine Learning and Google Cloud Platform (GCP), you will serve as a technical leader, guiding a team of data engineers while collaborating closely with data scientists and analysts. Your primary focus will be to support data-driven decision-making by analyzing complex data structures from various sources and designing large-scale data engineering pipelines. You will be responsible for developing extensive data structures and pipelines that organize, collect, and standardize data, ultimately generating insights and addressing reporting needs. In this role, you will implement data ingestion pipelines using APIs, third-party tools, or custom code to ingest high volumes of data into the cloud environment. You will write processes, design database systems, and develop tools for both real-time and offline analytic processing. Collaboration with product business and data science teams will be essential as you collect user stories, translate them into technical specifications, and implement data transformations, algorithms, and models into automated processes. Your strong programming skills in PySpark, Python, Java, or other major languages will be crucial in building robust data pipelines and dynamic systems. You will also build highly scalable and extensible data marts and data models to support Data Science and other internal customers on the cloud, ensuring that data from various sources adheres to quality and accessibility standards. Additionally, you will analyze current IT environments to identify critical capabilities and recommend solutions, facilitating machine learning across large-scale systems and campaigns in partnership with data engineering. The ideal candidate will be detail-oriented, capable of quickly understanding complex situations, managing multiple urgent tasks, and communicating openly to build trust and respect. You will work with data related to a wide range of customer interactions and analytics, designing and deploying large-scale ML models with support from the data engineering and product teams. Experimentation with available tools and advising on new tools will be part of your responsibilities to determine optimal solutions based on model requirements.