AMD - Bellevue, WA

posted about 2 months ago

Full-time - Principal
Bellevue, WA
Computer and Electronic Product Manufacturing

About the position

As a Principal Machine Learning Software Engineer at AMD, you will be at the forefront of transforming lives through advanced technology. Your primary focus will be on low-level performance optimization, which is crucial for enhancing AMD-based machine learning infrastructure. This role is pivotal in ensuring the efficient deployment of state-of-the-art large models, which are essential for various applications including data centers, artificial intelligence, gaming, and embedded systems. You will join a dynamic team dedicated to groundbreaking projects that push the limits of innovation and execution excellence. In this position, you will be responsible for optimizing model execution, particularly GPU kernels, for both inference and training in a multi-GPU and multi-node environment. Your work will directly influence AMD's ability to deliver cutting-edge AI solutions efficiently and at scale. You will engage in tasks such as developing and optimizing low-level GPU kernels to accelerate the performance of large machine learning models, maximizing computational efficiency, and reducing execution time while maintaining model accuracy. Additionally, you will design and implement strategies for distributed model training and inference across multiple GPUs and nodes, addressing challenges related to data and model parallelism. Performance profiling will be a key aspect of your role, as you will analyze system and application performance to identify bottlenecks and optimize hardware resource utilization. You will also explore model quantization techniques to minimize memory and computation overhead, particularly for edge and cloud deployments. Your collaboration with machine learning researchers, software engineers, and infrastructure teams will be essential to integrate optimized kernels into production systems, and you will be responsible for creating detailed documentation of your optimizations and best practices.

Responsibilities

  • Develop and optimize low-level GPU kernels to accelerate inference and training of large machine learning models.
  • Design and implement strategies for distributed model training and inference across multiple GPUs and nodes.
  • Profile and analyze system and application performance to identify bottlenecks and areas for improvement.
  • Leverage parallel computing techniques to improve the scalability and performance of machine learning workloads.
  • Explore and apply model quantization techniques to reduce memory and computation overhead.
  • Develop benchmarks and testing procedures to assess the performance and stability of optimized models and frameworks.
  • Collaborate closely with machine learning researchers, software engineers, and infrastructure teams to integrate optimized kernels and solutions into production systems.
  • Create detailed documentation of optimizations, best practices, and implementation guidelines.

Requirements

  • A Bachelor, Master's or Ph.D. in Computer Science, Electrical Engineering, or a related field or equivalent practical experience.
  • Solid understanding of GPU accelerators like ONNX, DeepSpeed, VLLM.
  • Strong experience in low-level GPU kernel optimization.
  • Proficiency in CUDA and GPU programming.
  • Experience with distributed computing and multi-GPU environments.
  • Proficiency in performance profiling and optimization tools.
  • Solid programming skills in Python and/or C++.
  • Experience with deep learning frameworks (e.g., JAX, PyTorch, TensorFlow).
  • Excellent problem-solving skills and attention to detail.
  • Strong communication and teamwork skills.

Benefits

  • Base pay dependent on skills, qualifications, experience, and location.
  • Eligibility for incentives such as annual bonuses or sales incentives.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service