Merck KGaA Darmstadt Germany - North Wales, PA
posted 4 months ago
As a Senior Specialist in Data Engineering at Merck & Co., Inc., you will play a crucial role in designing, developing, and maintaining data pipelines that extract data from various sources to populate our data lake and data warehouse. This position requires collaboration with the data governance team to implement data quality checks and maintain data catalogs, ensuring the integrity and usability of our data assets. You will utilize orchestration, logging, and monitoring tools to build resilient data pipelines, employing test-driven development methodologies for building ELT/ETL pipelines. A strong understanding of concepts such as data lakes, data warehouses, lake-houses, data meshes, and data fabrics is essential for this role. In addition to pipeline development, you will be responsible for developing data models for cloud data warehouses like Redshift and Snowflake, and creating pipelines to ingest data into these environments. Your analytical skills will be put to use as you analyze data using SQL and collaborate with Data Analysts, Data Scientists, and Machine Learning Engineers to identify and transform data for ingestion, exploration, and modeling. You will leverage serverless AWS services such as Glue, Lambda, and Step Functions, and utilize Terraform code for deployment on AWS. Containerization of Python code using Docker will also be a key aspect of your responsibilities, along with version control using Git and understanding various branching strategies. You will be expected to build pipelines that can handle large datasets using PySpark, develop proof of concepts using Jupyter Notebooks, and create technical documentation as needed. This position requires a proactive approach to problem-solving and a commitment to maintaining high standards of data quality and governance.