The Judge Group - Columbus, OH

posted about 2 months ago

Full-time - Mid Level
Columbus, OH
Administrative and Support Services

About the position

The MLOps Engineer will be responsible for the end-to-end productionization and deployment of machine learning models at scale. This role involves close collaboration with data scientists to refine models, maintain MLOps infrastructure, automate deployment pipelines, and ensure compliance with IT and security standards. The engineer will also develop and maintain APIs and data pipelines for seamless integration of machine learning model outputs into a Kafka-based event hub platform.

Responsibilities

  • Manage and update Docker images, ensuring they are secure and optimized.
  • Collaborate with data scientists to validate that models run effectively on updated images.
  • Address security vulnerabilities by updating and patching Docker images.
  • Deploy, manage, and scale AWS services (SageMaker, S3, Lambda) using Terraform.
  • Automate the spin-up and spin-down of AWS infrastructure using Terraform scripts.
  • Monitor and optimize AWS resources to ensure cost-effectiveness and efficiency.
  • Design, implement, and maintain CI/CD pipelines in Azure DevOps (ADO).
  • Integrate CI/CD practices with model deployment processes, ensuring smooth productionization of ML models.
  • Participate in the end-to-end process of productionizing machine learning models, from model deployment to monitoring and maintaining their performance.
  • Work with large language models, focusing on implementing near real-time and batch inferences.
  • Address data drift and model drift in production environments.
  • Collaborate with data scientists to refine and document model output schemas using Swagger for downstream API development.
  • Automate data transfers (data pipelines) to Kafka using FTP/SFTP or Kafka APIs.
  • Develop and maintain batch and real-time APIs for model output integration.
  • Work with Kafka engineers to ensure accurate data publishing and monitor for reliability.

Requirements

  • Deep expertise in Python for scripting and automation.
  • Strong experience with AWS services, particularly SageMaker, S3, and Lambda.
  • Proficiency in using Terraform for infrastructure-as-code on AWS.
  • Extensive experience with Docker, including building, managing, and securing Docker images.
  • Strong command-line skills in Linux, especially for Docker and system management.
  • Significant experience in setting up and managing CI/CD pipelines in Azure DevOps (ADO).
  • Proficient in using Git for version control and collaboration.
  • Proven experience in developing and managing both batch and real-time APIs, preferably in a Kafka-based event-driven architecture.
  • Expertise in API development, including both batch and real-time data processing.
  • Exposure to API documentation tools like Swagger.
  • Strong understanding of schema design and data serialization formats such as JSON.
  • Experience with Jenkins or other CI/CD tools is a plus.
  • 4 years of experience in combination of MLOps/DevOps/Data Engineering; Bachelor's degree in Computer Science, Engineering, or a related discipline.

Nice-to-haves

  • Experience with large language models and productionizing ML models in a cloud environment.
  • Exposure to near real-time inference systems and batch processing in ML.
  • Familiarity with data drift and model drift management.

Benefits

  • Negotiable salary
  • Contract-to-hire opportunity
  • Remote work flexibility (3 days onsite, 2 days remote)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service