Senior Data Engineer

$87,069 - $148,017/Yr

Unclassified

posted about 2 months ago

Full-time - Senior
5,001-10,000 employees

About the position

ICF is a mission-driven company filled with people who care deeply about improving the lives of others and making the world a better place. Our core values include Embracing Difference; we seek candidates who are passionate about building a culture that encourages, embraces, and hires dimensions of difference. Our Health Engineering Systems (HES) team works side by side with customers to articulate a vision for success, and then make it happen. We know success doesn't happen by accident. It takes the right team of people, working together on the right solutions for the customer. We are looking for a seasoned Senior Data Engineer who will be a key driver to make this happen. In this role, you will design, develop, and maintain scalable data pipelines using Spark, Hive, and Airflow. You will also develop and deploy data processing workflows on the Databricks platform and create API services to facilitate data access and integration. Your responsibilities will include creating interactive data visualizations and reports using AWS QuickSight, building the required infrastructure for optimal extraction, transformation, and loading of data from various data sources using AWS and SQL technologies, and monitoring and optimizing the performance of data infrastructure and processes. You will develop data quality and validation jobs, assemble large, complex sets of data that meet non-functional and functional business requirements, and write unit and integration tests for all data processing code. Collaboration is key in this position, as you will work with DevOps engineers on CI, CD, and IaC, read specifications and translate them into code and design documents, perform code reviews, and develop processes for improving code quality. You will also improve data availability and timeliness by implementing more frequent refreshes, tiered data storage, and optimizations of existing datasets while maintaining security and privacy for data at rest and in transit. Other duties may be assigned as needed.

Responsibilities

  • Design, develop, and maintain scalable data pipelines using Spark, Hive, and Airflow
  • Develop and deploy data processing workflows on the Databricks platform
  • Develop API services to facilitate data access and integration
  • Create interactive data visualizations and reports using AWS QuickSight
  • Build required infrastructure for optimal extraction, transformation, and loading of data from various data sources using AWS and SQL technologies
  • Monitor and optimize the performance of data infrastructure and processes
  • Develop data quality and validation jobs
  • Assemble large, complex sets of data that meet non-functional and functional business requirements
  • Write unit and integration tests for all data processing code
  • Work with DevOps engineers on CI, CD, and IaC
  • Read specs and translate them into code and design documents
  • Perform code reviews and develop processes for improving code quality
  • Improve data availability and timeliness by implementing more frequent refreshes, tiered data storage, and optimizations of existing datasets
  • Maintain security and privacy for data at rest and while in transit
  • Other duties as assigned

Requirements

  • Bachelor's degree in computer science, engineering or related field
  • 7+ years of hands-on software development experience
  • 4+ years of data pipeline experience using Python, Java and cloud technologies
  • Candidate must be able to obtain and maintain a Public Trust clearance
  • Candidate must reside in the US, be authorized to work in the US, and work must be performed in the US
  • Must have lived in the US 3 full years out of the last 5 years

Nice-to-haves

  • Experienced in Spark and Hive for big data processing
  • Experience building job workflows with the Databricks platform
  • Strong understanding of AWS products including S3, Redshift, RDS, EMR, AWS Glue, AWS Glue DataBrew, Jupyter Notebooks, Athena, QuickSight, EMR, and Amazon SNS
  • Familiar with work to build processes that support data transformation, workload management, data structures, dependency and metadata
  • Experienced in data governance process to ingest (batch, stream), curate, and share data with upstream and downstream data users
  • Experienced in data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up
  • Demonstrated understanding using software and tools including relational NoSQL and SQL databases including Cassandra and Postgres; workflow management and pipeline tools such as Airflow, Luigi and Azkaban; stream-processing systems like Spark-Streaming and Storm; and object function/object-oriented scripting languages including Scala, C++, Java and Python
  • Familiar with DevOps methodologies, including CI/CD pipelines (Github Actions) and IaC (Terraform)
  • Ability to obtain and maintain a Public Trust; residing in the United States
  • Experience with Agile methodology, using test-driven development.

Benefits

  • Reasonable Accommodations are available, including, but not limited to, for disabled veterans, individuals with disabilities, and individuals with sincerely held religious beliefs, in all phases of the application and employment process.
  • Pay Transparency Statement
  • Benefit offerings included in the Transparency in (Benefits) Coverage Act.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service