Verily - San Bruno, CA

posted 4 days ago

Full-time - Mid Level
San Bruno, CA
Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services

About the position

As a Data Scientist on our Registries and RWD (real-world data) team, you will be supporting our core mission to drive innovation in evidence generation for research and care decisions. We are building new types of longitudinal datasets that have foundations of RWD sources, such as EHRs (electronic health records) and claims data, and are augmented with prospective data collection. You will develop and deploy models that enable scalable curation of RWD. This will include multi-source integrations and reconciliations, creating derived features from the source data (e.g., abstraction of clinical concepts from unstructured data), and facilitating data quality assessments. You will work with a diverse cross-functional team to build reusable and scalable tools and to deliver products that unlock information from structured and unstructured clinical data.

Responsibilities

  • Work closely with cross-functional partners to design and create longitudinal datasets integrating multiple data sources.
  • Build and implement highly accurate machine learning models / AI tools using sparsely labeled healthcare datasets.
  • Implement, build on and augment existing LLM/NLP tools to maximize the value of using unstructured medical data across a range of research and care applications.
  • Become an expert in our data's capabilities and limitations. Solve difficult, non-routine analysis problems, handling data challenges from a real-world setting.
  • Communicate technical methods and results clearly in well structured reports and presentations to a range of technical and non-technical audiences.

Requirements

  • Advanced degree in a quantitative discipline (e.g., data sciences, statistics, biomedical informatics, computer science, applied mathematics, or similar), or equivalent practical experience.
  • 2+ years experience applying advanced machine learning and AI techniques (supervised and unsupervised methods, LLMs, NLP) to clinical data.
  • Direct experience working with and curating real-world data, such as EHR, including a deep understanding of the complexities of this structured and unstructured clinical data.
  • Strong proficiency in Python.

Nice-to-haves

  • Familiarity with medical terminologies and ontologies.
  • Familiarity with software engineering practices and experience developing production software.
  • Experience working with clinical subject matter experts.
  • Ability to work cross-functionally on teams, with a tolerance for ambiguity.
  • Creative and methodical problem solving: understand needs, identify options, form hypotheses, generate robust results, make informed decisions, and learn faster through feedback.

Benefits

  • Bonus
  • Equity
  • Health benefits
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service