AMD - Santa Clara, CA

posted about 1 month ago

Full-time - Senior
Santa Clara, CA
Computer and Electronic Product Manufacturing

About the position

The Application Software Optimization Engineer for ML/AI at AMD is a senior-level role focused on optimizing machine learning applications for AMD CPU and GPU platforms. The position involves porting and tuning a variety of scientific applications, engaging with product groups to resolve issues, and developing training materials for internal and external audiences. This role is integral to the Data Center GPU organization, which is dedicated to transforming the industry with AI-based graphic processors.

Responsibilities

  • Port and optimize a variety of machine learning based models and applications for AMD CPU and GPU systems
  • Provide domain specific knowledge to other groups at AMD
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences

Requirements

  • Experience in multiple scientific computing domains
  • Experience with using Machine Learning techniques in an AI setting
  • Self-motivated and ability to work well within a team environment
  • Experience working with customers in an engineering role
  • Broad experience building, running and tuning machine learning models
  • In-depth knowledge of current machine learning frameworks and commonly used models for training and inference
  • Strong performance analysis skills for both CPU and GPU
  • Extensive experience with C++ and Python
  • Familiarity with distributed model training via NCCL/RCCL, MPI, or similar network technologies
  • Experience in implementing and optimizing parallel methods on GPU accelerators in distributed memory systems with MPI, CUDA, HIP, OpenMP, etc.
  • Experience in scientific computing disciplines such as computational chemistry, fluid dynamics, weather modeling, and oil and gas applications
  • In-depth understanding of IO, parallel file systems, and network limitations and capabilities as used in AI models
  • Familiarity with installation and setup of various AI applications and machine learning frameworks
  • Experience provisioning clusters and validating their performance for use in machine learning applications
  • Experience with build system tools including Make, CMake, autoconf, and autotools
  • In-depth knowledge of software development practices including debug, test, revision control, documentation, and bug tracking
  • Strong team development skills including demonstrated expertise with git and Jira
  • Ability to work well in geographically dispersed teams

Benefits

  • Base pay dependent on skills, qualifications, experience, and location
  • Eligibility for annual bonus or sales incentive
  • Opportunity to own shares of AMD stock
  • Discount when purchasing AMD stock through Employee Stock Purchase Plan
  • Competitive benefits package
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service