MLOps Engineer

$152,000 - $209,000/Yr

Applied Materials - Santa Clara, CA

posted 4 days ago

Full-time - Mid Level
Santa Clara, CA
Machinery Manufacturing

About the position

As an MLOps Engineer, you will be responsible for ensuring the smooth operation of ML pipelines, from model development to deployment and monitoring. You will work across the full lifecycle of ML systems, including CI/CD, model versioning, orchestration, performance tuning, and automation. You will collaborate with cross-functional teams to design and implement scalable, reliable, and efficient ML infrastructure solutions that enable rapid experimentation and deployment of machine learning models.

Responsibilities

  • Act as a liaison with a sub-group within a business unit or a GIS Domain area for business and MLOps technology strategy alignment, solution discovery, service management, and project portfolio management.
  • Convert business requirements and/or issues into functional and technical specifications.
  • Assist in the design and technical development of complex MLOps solutions to meet business needs.
  • Perform and document application and platform configuration.
  • Prepare and execute test scenarios and scripts (unit, integration, performance, regression, acceptance) and data integration.
  • Participate in new technology evaluations.
  • Guide junior staff and contingent workers to adhere to GIS project management, software application development, testing, service management, change management, RCA, and other relevant processes, standards, governance, and controls.
  • Manage the execution of SOX controls and testing, and support internal and external audits.
  • Plan and manage small to medium-sized MLOps projects to ensure effective and efficient execution in line with guardrails of scope, timeline, budget, and quality.
  • Serve as an MLOps team lead on medium to large cross-functional application processes.
  • Manage contingent workers performing MLOps project and/or support services.
  • Responsible for the selection, onboarding, and offboarding of contingent workers in a timely manner.
  • Manage contingent worker project/task assignments and ensure work product quality.
  • Approve contingent worker timesheets/costs.
  • Collaborate with Data Scientists to deploy machine learning models into production environments.
  • Design and implement CI/CD pipelines to automate the training, validation, and deployment of models.
  • Ensure seamless integration of models with backend systems and cloud infrastructure.
  • Build and maintain scalable infrastructure for ML workflows using cloud platforms (AWS/GCP/Azure).
  • Manage containerized environments (Docker, Kubernetes) for model deployment and scaling.
  • Optimize model serving environments for low-latency and high-availability needs.
  • Implement and maintain monitoring and logging systems to track model performance and identify issues in real-time.
  • Ensure model performance is aligned with business goals and continually improve model retraining cycles.
  • Implement auto-scaling and fault-tolerant mechanisms to ensure high availability of ML services in production.
  • Work closely with data scientists, software engineers, and product teams to ensure alignment on model requirements, performance, and deployment strategy.
  • Provide guidance and best practices for maintaining model quality and optimizing deployment processes.
  • Assist in creating documentation for the deployment pipeline, model versioning, and infrastructure.
  • Develop scripts and tools to automate repetitive tasks within the ML lifecycle (data collection, preprocessing, retraining).
  • Implement security best practices for data privacy, access control, and model integrity in production environments.
  • Ensure compliance with relevant industry regulations for ML operations.

Requirements

  • Proficient in Python or other scripting languages.
  • Experience with ML model deployment frameworks such as TensorFlow Serving or custom REST APIs.
  • Strong knowledge of containerization technologies (Docker, Kubernetes) and cloud platforms (AWS, GCP, Azure).
  • Familiarity with CI/CD tools and version control systems (Git).
  • Understanding of ML lifecycle tools (Kubeflow, MLflow) is a plus.
  • Experience in monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog).
  • 4+ years of experience working in MLOps, DevOps, or related roles in a production environment.
  • Demonstrated experience deploying and maintaining machine learning models at scale in production.
  • Knowledge of model performance monitoring, A/B testing, and model retraining strategies.
  • Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related field, or equivalent practical experience.

Nice-to-haves

  • Relevant certifications (e.g., AWS Certified Machine Learning, Google Cloud Professional Machine Learning Engineer) are a plus.

Benefits

  • Comprehensive benefits package including participation in a bonus and a stock award program.
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service