AMD - Santa Clara, CA

posted about 2 months ago

Full-time - Senior
Santa Clara, CA
Computer and Electronic Product Manufacturing

About the position

At AMD, we are committed to transforming lives through our technology, and this role is pivotal in achieving that mission. As a Fellow/PMTS Data Center GPU Optimization Engineer, you will be part of a world-class team dedicated to enabling applications for AI/ML and High-Performance Computing (HPC). Our Data Center GPU organization is at the forefront of innovation, focusing on designing exceptional products that drive the evolution of computing experiences across various sectors, including enterprise data centers, AI, HPC, and embedded systems. In this senior role, you will be responsible for porting and optimizing a variety of machine learning applications for AMD CPU and GPU platforms. This involves working closely with a team of computational scientists and engineers to enhance the performance of scientific applications, ensuring they leverage the full capabilities of our cutting-edge hardware. You will also engage with other AMD product groups to resolve application and customer issues, providing your domain-specific knowledge to enhance our offerings. Your contributions will extend beyond technical optimization; you will develop and present training materials to internal teams, customers, and at industry conferences, sharing your expertise and promoting AMD's innovative solutions. This position requires a self-motivated individual who thrives in a collaborative environment and is passionate about pushing the boundaries of technology to solve complex challenges in AI and HPC.

Responsibilities

  • Port and optimize a variety of machine learning based applications for AMD CPU and GPU systems
  • Provide domain specific knowledge to other groups at AMD
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences

Requirements

  • Masters or PhD in Computer Science, Computational Physics, Engineering or related subjects, or equivalent experience
  • Experience in multiple scientific computing domains
  • Experience with using Machine Learning techniques in an AI/HPC setting
  • Self-motivated and ability to work well within a team environment
  • Expert level hands-on experience in Networking, Storage and cluster design, modelling, and analytics
  • Solid grounding in current AI/ML frameworks and deep understanding of the ecosystem
  • Extensive experience and mastery in Python and one systems language - preferably C++
  • Working experience with distributed pre-training, fine-tuning and inference
  • Familiarity with orchestrator/resource managers such as slurm and k8s
  • Broad experience creating, adapting, and running workloads with widely used HPC applications
  • Strong performance analysis skills for both CPU and GPU
  • Experience in working with large customers and excellent communication level from engineer to mid-management to C-level of audience

Nice-to-haves

  • Deep understanding of distributed systems and ability to dive deep into individual components such as compute, network and storage
  • NeurIPS/ICML, or equivalent publications
  • Thought leader, patents and other publications
  • Experience working at the k8s scheduler level
  • In-depth HPC knowledge
  • Ability to work well in geographically dispersed teams

Benefits

  • Base pay depending on skills, qualifications, experience, and location
  • Eligibility for incentives such as annual bonuses or sales incentives
  • Opportunity to own shares of AMD stock
  • Discount when purchasing AMD stock through Employee Stock Purchase Plan
  • Competitive benefits package
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service