Oregon Health & Science University

posted 8 days ago

Full-time - Mid Level
Remote
101-250 employees
Educational Services

About the position

The Data Engineer will play a crucial role in advancing the CDC Foundation's mission by designing, building, and maintaining modern data infrastructure for the Northwest Portland Area Indian Health Board (NPAIHB) Data Hub project. This position is aligned with the Workforce Acceleration Initiative (WAI), which aims to support public health agencies with technology and data expertise. The Data Engineer will collaborate with various stakeholders to develop scalable solutions that enhance the capacity of Tribal public health departments, ensuring effective data management and analysis to improve health outcomes in their communities.

Responsibilities

  • Design a data hub roadmap to streamline secure and reliable data management, including ingestion, processing, and storage through enhancements or implementation of new systems and pipelines.
  • Load data into storage systems or data warehouses, transforming, cleaning, and organizing with dimensional modeling techniques to ensure accuracy, consistency, and efficient querying.
  • Transform and structure data to ensure it is optimized for use in data visualization software, enabling accurate and effective visual representations of epidemiological data.
  • Collaborate closely with the project epidemiologist to ensure they gain a comprehensive understanding of the data pipeline architecture and data engineering methods to support long-term maintenance and sustainability of the system.
  • Ensure thorough and clear documentation of database architecture and workflows to promote sustainability, consistency, and ease of maintenance.
  • Define business rules around data governance for the Data Hub.
  • Apply rigorous data quality checks and validation processes to guarantee the accuracy and reliability of the data released, emphasizing the importance of delivering correct and trustworthy data to support public health initiatives.
  • Optimize data pipelines, infrastructure, and workflows for performance and scalability.
  • Monitor data pipelines and systems for performance issues, errors, and anomalies, and implement solutions to address them.
  • Analyze and interpret datasets to identify data management needs and advise on data management strategy.
  • Implement security measures to protect sensitive information.
  • Collaborate with epidemiologists, analysts, and other partners to understand current and future data needs and requirements, and to ensure that the data infrastructure supports the organization's goals and objectives.
  • Implement and maintain ETL processes to ensure the accuracy, completeness, and consistency of data.
  • Design and manage data storage systems, including migration of SAS datasets to PostgreSQL relational database.
  • Provide technical guidance to other staff on preparing and structuring data for visualization, leveraging knowledge of visualization tools to support the creation of meaningful and insightful visual outputs.

Requirements

  • Bachelor's degree in Computer Science, Information Technology, Data Science, or a related field.
  • Minimum of five (5) years of related informatics experience, preferably with three (3) years of experience in a lead data engineer position.
  • Demonstrated expertise in building SQL relational databases and transitioning non-relational data into a structured relational format, ensuring seamless integration and optimized performance.
  • Proficiency in SQL programming and other languages commonly used in data engineering, such as Python, Java, Scala.
  • Experience transforming and preparing data into formats suitable for data visualization software, ensuring it is structured for optimal use in dashboards and other visual outputs.
  • Strong understanding of database systems, including relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra), with PostgreSQL preferred.
  • Experience regarding engineering best practices such as source control, automated testing, continuous integration and deployment, and peer review.
  • Knowledge of data warehousing concepts and tools.
  • Experience with cloud computing platforms, with preference for experience in AWS environment.
  • Expertise in data modeling, ETL (Extract, Transform, Load) processes, and data integration techniques.
  • Strong analytical thinking and problem-solving abilities.
  • Excellent verbal and written communication skills, including the ability to convey technical concepts to non-technical partners effectively.
  • Flexibility to adapt to evolving project requirements and priorities.
  • Outstanding interpersonal and teamwork skills; and the ability to develop productive working relationships with colleagues and partners.
  • Experience working in a virtual environment with remote partners and teams.

Nice-to-haves

  • Experience facilitating data requirements gathering sessions to support data modeling plans.
  • Experience planning and designing database models based on business data requirements.
  • Experience working with complex public health, health care, or other non-business data requiring advanced processing and analysis techniques.
  • Experience transitioning SAS datasets and analyses into relational database structures.
  • Experience building data pipelines within Amazon Web Services (AWS), such as AWS Relational Database Services (RDS), Amazon Aurora Serverless, AWS Glue, Lambda.
  • Experience creating complex fields and visuals in AWS QuickSight or similar data visualization tools (Tableau, Microsoft Power BI, etc).
  • Experience with dimensional modeling in scenarios where dimensions and fields change over time.
  • Experience with implementing data suppression techniques and familiarity with HIPAA, PHI, and other data confidentiality regulations.

Benefits

  • Fully remote work arrangement for U.S. based candidates.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service