Informatics Data Engineer

Wake Forest Baptist Health - Winston-Salem, NC

posted about 2 months ago

Full-time - Mid Level

Remote - Winston-Salem, NC

5,001-10,000 employees

Hospitals

About the position

The Office of Informatics in the Clinical and Translational Sciences Institute serves the research community of the Wake Forest School of Medicine by providing analytic services and resources necessary to support our academic learning health system. Our data engineers are the foundation for everything we do by making sure that all the ‘big data' from across our enterprise is accurate, clean, and readily available in common data models and specifications. They work closely with the rest of our tight-knit data team who rely on their work to transform clinical and educational data into research-ready datamarts and reporting tools that drive forward operational efficiencies and improved patient care. The Office of Informatics is based out of newly renovated offices at the Innovation Quarter in downtown Winston-Salem with a hybrid model which supports remote work options. In this role, you will design, build, implement, and maintain data processing pipelines for the extraction, transformation, and loading (ETL) of data from various data sources. You will develop robust and scalable solutions that transform data into useful formats for analysis and data sharing, enhance data flow, and enable end users to consume, analyze and share data faster and easier. Writing complex SQL queries to support analytics needs will be a key part of your responsibilities. You will evaluate and recommend tools and technologies for data infrastructure and processing, collaborating with statisticians, data scientists, programmers, data analysts, product teams, and other stakeholders to translate business requirements into technical specifications and coded data pipelines. You will work with structured and unstructured data from a variety of data stores, such as data lakes, relational database management systems, and/or data warehouses.

Responsibilities

Design, build, implement, and maintain data processing pipelines for ETL of data from various sources.
Develop robust and scalable solutions that transform data into useful formats for analysis and data sharing.
Write complex SQL queries to support analytics needs.
Evaluate and recommend tools and technologies for data infrastructure and processing.
Collaborate with statisticians, data scientists, programmers, data analysts, product teams, and other stakeholders to translate business requirements to technical specifications and coded data pipelines.
Work with structured and unstructured data from various data stores, such as data lakes, relational database management systems, and/or data warehouses.
Build custom ingestion pipelines to incorporate data from novel sources such as outputs from machine learning models.
Ensure visibility on the status of automated data tasks to catch mistakes before they become problems.
Collaborate with other members of the data team to improve performance and stability of transformation tasks.
Participate in design conversations for improving the architecture of our data infrastructure.
Support the team in identifying and implementing data integration and quality control strategies to improve data quality and availability.
Prepare research data for ingestion and conversion to a unified data standard using ETL and automation tools.
Assist the team by maintaining the database environment by creating views/queries, documentation, and data pipelines.
Own data integrity, availability, documentation, and efficient access to data.
Identify opportunities for process improvement in the end-to-end data development and delivery lifecycle.
Incorporate automation wherever possible to improve access to data and analyses.

Requirements

Bachelor's degree and 4+ years of experience or an equivalent combination of education and experience in computer programming.
Strong initiative and proven ability to work independently.
Moderate skill set and proficiency in discipline, conducting work assignments of increasing complexity under moderate supervision with some latitude for independent judgment.
Experience with data replication tools or services such as Meltano, Airbyte, Fivetran, or Stitch.
Experience with orchestration tools such as Airflow, Luigi, Prefect, or Dagster.
Experience using scalable and distributed compute, storage, and networking resources such as those provided by Azure, especially in the context of Microsoft Fabric.
Experience with code versioning systems such as Git.
Knowledge of common file formats for analytic data workloads like Parquet, ORC, or Avro.
Knowledge of high-performance table formats such as Apache Iceberg or Delta Lake.
Additional consideration given for experience with tools, languages, data processing frameworks, and databases such as R, Python, SQL, MongoDB, Redis, Hadoop, Spark, Hive, Scala, BigTable, Cassandra, Presto, Strom.
Experience with healthcare and/or biomedical research operations and systems is a plus.
Ability to communicate on a professional level with customers and staff.
Superior problem-solving skills.

Nice-to-haves

Experience with healthcare and/or biomedical research operations and systems is a plus.

Benefits

401K Plan
Health Insurance
Vacation & Paid Time Off
Flexible schedule
Remote work options
Competitive Pay
Good benefits
Supportive management
Work/life balance.

Informatics Data Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company