Sr. Machine Learning Performance Engineer

AMD - San Jose, CA

posted 4 months ago

Full-time - Mid Level

San Jose, CA

Computer and Electronic Product Manufacturing

About the position

At AMD, we are committed to transforming lives through our technology, and as a Sr. Machine Learning Performance Engineer, you will play a crucial role in this mission. This position focuses on ML performance modeling, projection, and optimization for various machine learning workloads, while also participating in hardware and software co-design. You will analyze the interaction between ML workloads and hardware architecture, particularly modeling workloads such as generative AI models across multiple hardware configurations. Your insights and recommendations will be vital in shaping the future of AI acceleration and ensuring that our products meet the evolving needs of our customers. In this role, you will be responsible for analyzing and exploring recent machine learning models, understanding their compute and memory requirements, and providing projections on various compute hardware for both inference and training. You will also be tasked with identifying innovative ways to enhance performance. Your work will involve performance modeling and analysis of ML training and inference workloads across single and multiple accelerators, exploring various trade-offs and design decisions. You will actively participate in hardware-software co-design efforts aimed at optimizing future hardware for various ML workloads. Effective communication is key in this role, as you will need to present the results of your performance analysis and modeling to stakeholders, providing concrete recommendations based on your findings. Additionally, you will contribute to the development and improvement of our framework, tools, and infrastructure for performance estimation, modeling, and reporting. Collaboration across teams will be essential to ensure that we are aligned in our goals and strategies.

Responsibilities

Performance modeling and analysis of ML training and inference workloads across single and multiple accelerators.
Explore various trade-offs and design decisions related to ML workloads.
Participate in hardware-software co-design for future hardware optimization on various ML workloads.
Communicate and present the results of the performance analysis and modeling to stakeholders and provide concrete recommendations.
Develop and improve our framework, tools, and infrastructure for performance estimation, modeling, and reporting.
Engage in cross-team collaboration.

Requirements

Strong experience with ML hardware architecture, software optimization, and performance modeling.
Excellent written, verbal, and presentation skills.
Proficiency in C++ coding.
PhD or Master's degree in computer science, electrical engineering, or a related field, plus equivalent experience.

Nice-to-haves

Experience with generative AI models.
Familiarity with performance estimation tools and methodologies.

Benefits

Base pay dependent on skills, qualifications, experience, and location.
Eligibility for annual bonuses or sales incentives.
Opportunity to own shares of AMD stock through the Employee Stock Purchase Plan with discounts.
Competitive benefits package.

Sr. Machine Learning Performance Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company