ICF International - Reston, VA

posted about 2 months ago

Full-time - Senior
Remote - Reston, VA
5,001-10,000 employees
Administrative and Support Services

About the position

ICF is a mission-driven company filled with people who care deeply about improving the lives of others and making the world a better place. Our core values include Embracing Difference; we seek candidates who are passionate about building a culture that encourages, embraces, and hires dimensions of difference. Our Health Engineering Systems (HES) team works side by side with customers to articulate a vision for success, and then make it happen. We know success doesn't happen by accident. It takes the right team of people, working together on the right solutions for the customer. We are looking for a seasoned Senior Data Engineer who will be a key driver to make this happen. As a Senior Data Engineer, you will be responsible for designing, developing, and maintaining scalable data pipelines using technologies such as Spark, Hive, and Airflow. You will develop and deploy data processing workflows on the Databricks platform and create API services to facilitate data access and integration. Your role will also involve creating interactive data visualizations and reports using AWS QuickSight, building the required infrastructure for optimal extraction, transformation, and loading of data from various data sources using AWS and SQL technologies. You will monitor and optimize the performance of data infrastructure and processes, develop data quality and validation jobs, and assemble large, complex sets of data that meet both non-functional and functional business requirements. In addition, you will be responsible for writing unit and integration tests for all data processing code, working with DevOps engineers on CI, CD, and IaC, and translating specifications into code and design documents. You will perform code reviews and develop processes for improving code quality, improve data availability and timeliness by implementing more frequent refreshes, tiered data storage, and optimizations of existing datasets, and maintain security and privacy for data at rest and while in transit. Other duties may be assigned as necessary.

Responsibilities

  • Design, develop, and maintain scalable data pipelines using Spark, Hive, and Airflow
  • Develop and deploy data processing workflows on the Databricks platform
  • Develop API services to facilitate data access and integration
  • Create interactive data visualizations and reports using AWS QuickSight
  • Build required infrastructure for optimal extraction, transformation and loading of data from various data sources using AWS and SQL technologies
  • Monitor and optimize the performance of data infrastructure and processes
  • Develop data quality and validation jobs
  • Assemble large, complex sets of data that meet non-functional and functional business requirements
  • Write unit and integration tests for all data processing code
  • Work with DevOps engineers on CI, CD, and IaC
  • Read specs and translate them into code and design documents
  • Perform code reviews and develop processes for improving code quality
  • Improve data availability and timeliness by implementing more frequent refreshes, tiered data storage, and optimizations of existing datasets
  • Maintain security and privacy for data at rest and while in transit
  • Other duties as assigned

Requirements

  • Bachelor's degree in computer science, engineering or related field
  • 7+ years of hands-on software development experience
  • 4+ years of data pipeline experience using Python, Java and cloud technologies
  • Candidate must be able to obtain and maintain a Public Trust clearance
  • Candidate must reside in the US, be authorized to work in the US, and work must be performed in the US
  • Must have lived in the US 3 full years out of the last 5 years

Nice-to-haves

  • Experienced in Spark and Hive for big data processing
  • Experience building job workflows with the Databricks platform
  • Strong understanding of AWS products including S3, Redshift, RDS, EMR, AWS Glue, AWS Glue DataBrew, Jupyter Notebooks, Athena, QuickSight, EMR, and Amazon SNS
  • Familiar with work to build processes that support data transformation, workload management, data structures, dependency and metadata
  • Experienced in data governance process to ingest (batch, stream), curate, and share data with upstream and downstream data users
  • Experienced in data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up
  • Demonstrated understanding using software and tools including relational NoSQL and SQL databases including Cassandra and Postgres; workflow management and pipeline tools such as Airflow, Luigi and Azkaban; stream-processing systems like Spark-Streaming and Storm; and object function/object-oriented scripting languages including Scala, C++, Java and Python
  • Familiar with DevOps methodologies, including CI/CD pipelines (Github Actions) and IaC (Terraform)
  • Ability to obtain and maintain a Public Trust; residing in the United States
  • Experience with Agile methodology, using test-driven development.

Benefits

  • Health insurance
  • Dental insurance
  • Vision insurance
  • 401k plan
  • Paid holidays
  • Paid time off
  • Flexible scheduling
  • Professional development opportunities
  • Employee assistance program
  • Tuition reimbursement
  • Wellness programs
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service