AMD - Austin, TX

posted 5 months ago

Full-time - Mid Level
Austin, TX
Computer and Electronic Product Manufacturing

About the position

At AMD, we are committed to transforming lives through our technology, and as an AI Software Engineer in our Data Center GPU organization, you will play a crucial role in this mission. Our team is dedicated to designing exceptional products that drive the evolution of computing experiences, particularly in the realms of enterprise data centers, artificial intelligence, high-performance computing (HPC), and embedded systems. We are looking for talented and highly motivated individuals to join us in pushing the boundaries of efficiency and performance in software development for next-generation GPU computational accelerators. As part of our world-class team, you will be responsible for enabling software solutions for some of the most powerful supercomputers and data centers in the industry. You will work closely with sophisticated clients to help them leverage the latest hardware capabilities for their AI use cases. This role offers a unique opportunity to combine cutting-edge hardware with the latest applications, libraries, frameworks, and SDKs, allowing you to contribute to solving some of the world's most complex challenges. In this position, you will collaborate with a team of software engineers to enable deep learning models, libraries, and applications for Instinct GPUs in both on-premises and cloud environments. You will need to have strong programming skills in Python and/or C++, as well as experience in analyzing and optimizing the performance of AI software. Understanding hardware bottlenecks and how to harness performance to achieve optimal results is essential. We are looking for self-motivated individuals who thrive in a team environment and are eager to learn new skills and methods to enhance the quality and timeliness of AMD software products.

Responsibilities

  • Enable deep learning models, libraries, and applications for Instinct GPUs in on-prem and cloud environments.
  • Analyze and optimize the performance of AI software, addressing hardware bottlenecks.
  • Collaborate with a team of software engineers to push the boundaries of efficiency and performance.
  • Work with industry clients to leverage the latest hardware capabilities for AI use cases.
  • Combine cutting-edge hardware with the latest applications, libraries, frameworks, and SDKs.

Requirements

  • Strong programming skills in C++ and/or Python.
  • Development experience with at least one major deep learning framework such as PyTorch or TensorFlow in inference, fine-tuning, and/or training.
  • BS/MS in Computer Science, Computer Engineering, or a related field with years of relevant experience, or a PhD with years of related experience.
  • Experience with open-source software development and collaboration with community maintainers is a plus.
  • Excellent analytical and problem-solving skills to root-cause and address performance issues.
  • Ability to work independently and as part of a team.
  • Willingness to learn new skills, tools, and methods to improve AMD software products.

Nice-to-haves

  • Expertise in profiling tools across the AI software stack (Torchprofiler, RocM profiler, Vtune, Nsight).
  • Experience in implementing and optimizing parallel methods on GPU accelerators (NCCL/RCCL, OpenMP, MPI).
  • Performance analysis skills for both CPU and GPU.
  • Experience with Singularity, Docker, and/or Kubernetes.
  • Experience in providing clear and timely communication related to project status to the leadership team.

Benefits

  • Base pay dependent on skills, qualifications, experience, and location.
  • Eligibility for annual bonuses or sales incentives.
  • Opportunity to own shares of AMD stock and discounts through the Employee Stock Purchase Plan.
  • Competitive benefits package.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service