Onesource Regulatory - McKinney, TX

posted 3 days ago

Full-time
McKinney, TX
Professional, Scientific, and Technical Services

About the position

The Data Engineer position at OneSource Regulatory Technology involves working on R&D projects to enhance data performance in the pharmaceutical sector. The role focuses on extracting, cleaning, normalizing, and loading data from various sources into databases, ensuring data integrity and quality throughout the process.

Responsibilities

  • Parse and synthesize XML and/or JSON documents.
  • Curate data through intermediate to advanced web scraping techniques, including fetching data via SFTP, FTP, Wget, Curl, REST APIs, and GraphQL queries.
  • Utilize Linux command line tools such as grep, wc, sed, awk, find, ls, and cat, along with light Bash shell scripting and crontab scheduling.
  • Work with SQL databases including PostGres, MySQL, and Google BigQuery.
  • Handle No-SQL databases like MongoDB or similar.
  • Familiarize with basic cloud technologies such as storage buckets and serverless functions.
  • Extract text and images from PDF files.
  • Use Puppeteer or similar web client technologies for automation.
  • Understand JavaScript, HTML/CSS, and HTTP methods for web scraping.

Requirements

  • 4+ years of experience as a data engineer.
  • Solid experience with Python and libraries such as Pandas and requests.
  • Basic knowledge of SQL and No-SQL databases.
  • Familiarity with cloud technologies and web scraping tools.
  • Strong English communication skills and attention to detail.

Nice-to-haves

  • Experience in the pharmaceutical space.
  • Ability to expose data via C# .NET Core and/or GraphQL.
  • Experience with Google Cloud Platform services.
  • Knowledge of Python multithreading and data manipulation techniques.
  • Familiarity with Docker and Kubernetes for data processing.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service