Principal Machine Learning Performance Engineer

$226,400 - $339,600/Yr

AMD - San Jose, CA

posted 2 months ago

Full-time - Principal

San Jose, CA

Computer and Electronic Product Manufacturing

About the position

At AMD, we are committed to transforming lives through our technology, enriching industries, communities, and the world. Our mission is to create exceptional products that accelerate next-generation computing experiences, serving as the foundation for data centers, artificial intelligence, PCs, gaming, and embedded systems. The culture at AMD is built on pushing the boundaries of innovation to tackle the world's most pressing challenges. We prioritize execution excellence while fostering a direct, humble, collaborative, and inclusive environment that values diverse perspectives. We are currently seeking a Principal Machine Learning Performance Engineer who will specialize in ML performance modeling, projection, and optimization for various machine learning workloads. This role involves participating in hardware and software co-design, focusing on the interaction between ML workloads and hardware architecture. The engineer will model workloads, including generative AI models across multiple hardware configurations, and provide summarized recommendations based on their findings. In this position, you will collaborate with both customers and business units to project performance, analyze results, and develop solutions that meet customer needs. If you are passionate about performance optimization and eager to maximize hardware capabilities while shaping the future of AI acceleration, this role is an excellent fit for you. As a Machine Learning Performance Engineer, you will analyze and explore recent machine learning models, understand their compute and memory requirements, and provide projections on various compute hardware for both inference and training. You will also be tasked with identifying innovative ways to enhance performance.

Responsibilities

Performance modeling and analysis of ML training and inference workloads across single and multiple accelerators.
Explore various tradeoffs and design decisions for hardware optimization on ML workloads.
Participate in hardware-software co-design for future hardware optimization.
Communicate and present the results of performance analysis and modeling to stakeholders, providing concrete recommendations.
Develop and improve frameworks, tools, and infrastructure for performance estimation, modeling, and reporting.
Facilitate cross-team collaboration.

Requirements

Strong experience with ML hardware architecture, software optimization, and performance modeling.
Excellent written, verbal, and presentation skills.
Proficiency in C++ coding.
PhD or master's degree in computer science, electrical engineering, or a related field, plus equivalent experience.

Nice-to-haves

Experience in performance analysis and projection.
Familiarity with generative AI models and their requirements.

Benefits

Base pay competitive with industry standards.
Eligibility for annual bonuses or sales incentives.
Opportunity to own shares of AMD stock through the Employee Stock Purchase Plan with discounts.

Principal Machine Learning Performance Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company