Paycom Payroll - Oklahoma City, OK

posted about 2 months ago

Full-time
Oklahoma City, OK
Professional, Scientific, and Technical Services

About the position

This position will be located within the Development and IT space and will work closely with computer scientists, IT, and data scientists to build, deploy, and optimize data pipelines. The role is crucial for integrating data infrastructure that enables analytics, reporting, and machine learning workloads at scale. The successful candidate will be responsible for building, testing, and validating robust production-grade data pipelines that can ingest, aggregate, and transform large datasets according to the specifications of the internal teams who will be consuming the data. In addition to building data pipelines, the role involves creating frameworks and custom tooling for data pipeline code development. The engineer will deploy data pipelines and data connectors to production environments, configure connections to source data systems, and validate schema definitions with the teams responsible for the source data. Monitoring data pipelines and data connectors, troubleshooting issues as they arise, and ensuring the performance and data integrity of the data lake environment are also key responsibilities. The engineer will manage data infrastructure such as Kafka and Kubernetes clusters, collaborate with IT and database teams to maintain the overall data ecosystem, and assist data science, business intelligence, and other teams in utilizing the data provided by the data pipelines. Mentoring junior data engineers and deploying machine learning models to production environments are also part of the role. The engineer will gather requirements, determine the scope of new projects, research and evaluate new technologies, and set up proof-of-concept deployments. Collaboration with data governance and compliance teams is essential to ensure that data pipelines and storage environments meet all necessary requirements. The engineer will also serve as on-call for production issues related to data pipelines and other data infrastructure maintained by the data engineering team.

Responsibilities

  • Build, test, and validate robust production-grade data pipelines that can ingest, aggregate, and transform large datasets.
  • Build frameworks and custom tooling for data pipeline code development.
  • Deploy data pipelines and data connectors to production environments.
  • Configure connections to source data systems and validate schema definitions with the teams responsible for the source data.
  • Monitor data pipelines and data connectors and troubleshoot issues as they arise.
  • Monitor data lake environment for performance and data integrity.
  • Manage data infrastructure such as Kafka and Kubernetes clusters.
  • Collaborate with IT and database teams to maintain the overall data ecosystem.
  • Assist data science, business intelligence, and other teams in using the data provided by the data pipelines.
  • Mentor junior data engineers.
  • Deploy machine learning models to production environments.
  • Gather requirements and determine scope of new projects.
  • Research and evaluate new technologies and set up proof-of-concept deployments.
  • Test proof-of-concept deployments of new technologies.
  • Collaborate with data governance and compliance teams to ensure data pipelines and data storage environments meet requirements.
  • Serve as on-call for production issues related to data pipelines and other data infrastructure maintained by the data engineering team.

Requirements

  • BS degree in Computer Science or related field.
  • 5+ years of data engineering work experience.
  • Experience coding in either Python, Java, or Scala.
  • Experience with build tools is preferred.
  • Experience building and maintaining data pipelines for batch or stream processing.
  • Experience working in a Unix or Linux environment.
  • Experience with CICD tools and processes.
  • Experience with SQL databases is required; experience with NoSQL solutions is preferred.
  • Experience with Docker and Kubernetes highly preferred.
  • Experience with object storage environments is preferred.
  • Experience with Apache Spark or Apache Flink is preferred.
  • Experience with a streaming platform such as Kafka, Kinesis, or Pulsar is preferred.
  • Experience with data lake query engines such as Presto or Dremio is preferred.
  • Experience with workflow orchestration tools like Apache Airflow or Dagster highly preferred.

Nice-to-haves

  • Strong expertise in computer science fundamentals: data structures, performance complexities, algorithms, and implications of computer architecture on software performance such as I/O and memory tuning.
  • Working knowledge of software engineering fundamentals: version control systems such as Git and Github, workflows, ability to write production-ready code.
  • Knowledge of data architecture and distributed data processing engines such as Spark and Hadoop.
  • Ability to create SQL queries of moderate complexity.
  • Strong troubleshooting skills.
  • Strong technical aptitude.
  • Strong critical thinking skills and the ability to relate them to the products of Paycom.
  • Excellent verbal and written communication skills.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service