Machine Learning Engineer, Training

$158,000 - $200,000/Yr

Waymo - Mountain View, CA

posted about 1 month ago

Full-time - Mid Level

Mountain View, CA

Administrative and Support Services

About the position

The Machine Learning Engineer, Training at Waymo is responsible for developing infrastructure components necessary for distributed training of machine learning models, particularly in the context of autonomous driving technology. This role involves implementing automation solutions, monitoring system health, diagnosing issues, and optimizing performance to enhance the developer experience and the efficiency of the ML framework.

Responsibilities

Develop the infrastructure components necessary for distributed training, including job scheduling, resource management, data distribution, and model synchronization.
Implement automation solutions for provisioning, deployment, monitoring, and scaling of distributed training infrastructure to improve operations and reliability.
Monitor system health, diagnose and troubleshoot issues, and perform routine maintenance tasks to ensure the reliability of the distributed training infrastructure.
Identify performance bottlenecks and optimization opportunities.
Improve the developer experience and performance of our scalable ML framework.

Requirements

Bachelor's degree in Computer Science, Engineering, or related field, or 2+ years equivalent experience.
Experience with distributed systems principles and experience building distributed systems for production environments.
Solid Python or C++ skills.
Prior experience with Machine Learning frameworks (e.g., TensorFlow, PyTorch) and distributed training algorithms.
Debug complex distributed systems issues.
Experience communicating updates and resolutions to customers and other partners.

Nice-to-haves

Practical familiarity using ML accelerator profiling tools to uncover performance bottlenecks.
Familiarity with cloud computing platforms (e.g., AWS, Azure, GCP) and experience deploying and managing distributed systems in cloud environments.
Knowledge of optimization and deep learning algorithms.

Benefits

Discretionary annual bonus program
Equity incentive plan
Generous Company benefits program

Machine Learning Engineer, Training

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company