AMD - Boxborough, MA

posted about 1 month ago

Full-time - Senior
Boxborough, MA
Computer and Electronic Product Manufacturing

About the position

AMD's Data Center GPU organization is at the forefront of transforming the industry with our AI-based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, Artificial Intelligence (AI), High-Performance Computing (HPC), and Embedded systems. We are looking for talented and highly motivated computational scientists/engineers to join our team of developers preparing applications for AI platforms across the globe. This position is for a senior-level application optimization engineer in AI, focusing on optimizing Machine Learning applications. You will be part of a team porting and tuning a wide variety of scientific applications for AMD CPU and GPU platforms. As an engineer computational scientist or physicist with experience in multiple scientific computing domains, you will be expected to utilize Machine Learning techniques in an AI setting. The ideal candidate must be self-motivated and possess the ability to work well within a team environment. You will port and optimize a variety of machine learning-based models and applications for AMD CPU and GPU systems, provide domain-specific knowledge to other groups at AMD, engage with AMD product groups to drive resolution of application and customer issues, and develop and present training materials to internal audiences, at customer venues, and at industry conferences.

Responsibilities

  • Port and optimize a variety of machine learning based models and applications for AMD CPU and GPU systems
  • Provide domain specific knowledge to other groups at AMD
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences

Requirements

  • Broad experience building, running and tuning machine learning models
  • In depth knowledge of current machine learning frameworks and commonly used models for training and inference
  • Strong performance analysis skills for both CPU and GPU
  • Extensive experience with C++ and Python
  • Familiarity with distributed model training via NCCL/RCCL, MPI, or similar network technologies
  • Experience in implementing and optimizing parallel methods on GPU accelerators in distributed memory systems with MPI, CUDA, HIP, OpenMP, etc.
  • Experience in scientific computing disciplines such as computational chemistry, fluid dynamics, weather modeling, and oil and gas applications
  • In-depth understanding of IO, parallel file systems, and network limitations and capabilities as used in AI models
  • Familiarity with installation and setup of various AI applications and machine learning frameworks
  • Experience provisioning clusters and validating their performance for use in machine learning applications
  • Experience with build system tools including Make, CMake, autoconf, and autotools
  • In-depth knowledge of software development practices including debug, test, revision control, documentation, and bug tracking
  • Strong team development skills including demonstrated expertise with git and Jira
  • Ability to work well in geographically dispersed teams

Benefits

  • Base pay depending on skills, qualifications, experience, and location
  • Eligibility for incentives such as annual bonus or sales incentive
  • Opportunity to own shares of AMD stock
  • Discount when purchasing AMD stock through Employee Stock Purchase Plan
  • Competitive benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service