AMD - Santa Clara, CA

posted 3 months ago

Full-time - Mid Level
Santa Clara, CA
Computer and Electronic Product Manufacturing

About the position

The Datacenter GPU Software Product Applications Engineer at AMD plays a crucial role in the technical execution of AMD's Datacenter graphics hardware and software subsystem projects. This position is designed for individuals who are passionate about applying their expertise in graphics, compute, datacenter technologies, and AI/Machine Learning to support AMD's OEM partners and enterprise commercial end-customers. The engineer will work collaboratively with customers utilizing AMD Instinct Accelerators, ensuring that the products meet the high standards expected in the industry. This role not only involves technical skills but also requires strong program management capabilities to navigate complex projects effectively. In this position, the engineer will be responsible for porting and optimizing a variety of AI and machine learning models and applications specifically for AMD GPU and CPU systems. This includes providing domain-specific knowledge to various groups within AMD, engaging with product teams to resolve application and customer issues, and developing training materials for both internal audiences and external stakeholders, including presentations at industry conferences. The role is integral to driving the evolution of computing experiences through innovative AI-powered products. The ideal candidate will be a computational scientist or physicist with a robust background in scientific computing and machine learning techniques. They should be self-motivated and thrive in a team-oriented environment, contributing to AMD's mission of transforming lives through technology. The position offers a unique opportunity to work at the forefront of AI and machine learning applications in the datacenter space, making a significant impact on the industry.

Responsibilities

  • Port and optimize a variety of AI and machine learning based models and applications for AMD GPU and CPU systems
  • Provide domain specific knowledge to other groups at AMD
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences

Requirements

  • 10-15 years of relevant industry experience
  • Masters or PhD in Computer Science, Computational Physics, Engineering or related subjects, or equivalent experience
  • Broad experience building, running and tuning AI and machine learning models
  • In depth knowledge of current machine learning frameworks and commonly used models for training and inference
  • Strong performance analysis skills for both GPU and CPU
  • Extensive experience with C++ and Python
  • Familiarity with distributed model training via NCCL/RCCL, MPI, or similar network technologies
  • Experience in implementing and optimizing parallel methods on GPU accelerators in distributed memory systems with MPI, CUDA, HIP, OpenMP, etc.
  • Familiarity in scientific computing disciplines such as computational chemistry, fluid dynamics, weather modeling, and oil and gas applications
  • In-depth understanding of IO, parallel file systems, and network limitations and capabilities as used in AI models
  • Familiarity with installation and setup of various AI applications and machine learning frameworks
  • Experience provisioning clusters and validating their performance for use in machine learning applications
  • Experience with build system tools including Make, CMake, autoconf, and autotools
  • In-depth knowledge of software development practices including debug, test, revision control, documentation, and bug tracking
  • Strong team development skills including demonstrated expertise with git and JIRA
  • Ability to work well in geographically dispersed teams

Nice-to-haves

  • Experience working with customers in a support function
  • Strong written and verbal communication skills and knowledge of program management practices
  • Self-starter, detail oriented, organized, and capable of multi-tasking in a fast-moving environment
  • Motivated to provide highly responsive support as needed and to work independently

Benefits

  • Base pay depending on skills, qualifications, experience, and location
  • Eligibility for annual bonus or sales incentive
  • Opportunity to own shares of AMD stock
  • Discount when purchasing AMD stock through Employee Stock Purchase Plan
  • Competitive benefits package
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service