Qualcomm - San Diego, CA

posted 8 days ago

Full-time - Principal
San Diego, CA
Computer and Electronic Product Manufacturing

About the position

The Principal AI Performance Architect at Qualcomm is responsible for driving functional, performance, and power enhancements in hardware to enable state-of-the-art training capabilities for AI models. This role involves understanding trends in machine learning network design, collaborating with customers to determine hardware requirements, and architecting enhancements for efficient AI model training. The position requires a deep understanding of both software and hardware architecture, particularly in the context of AI accelerators and GPUs.

Responsibilities

  • Understand trends in ML network design through customer engagements and latest academic research and determine how this will affect both SW and HW design
  • Work with customers to determine hardware requirements for AI training systems
  • Analyze current accelerator and GPU architectures
  • Architect enhancements required for efficient training of AI models
  • Design and architecture of flexible computational blocks involving various data types and precision
  • Develop memory technology and subsystems optimized for capacity, bandwidth, and compute
  • Design scale-out and scale-up architectures including switches and NoCs
  • Optimize for power and perform competitive analysis
  • Codesign hardware with software/GenAI requirements
  • Define performance models to prove effectiveness of architecture proposals
  • Conduct pre-silicon prediction of performance for various ML training workloads
  • Analyze performance/area/power trade-offs for future HW and SW ML algorithms.

Requirements

  • Master's degree in Computer Science, Engineering, Information Systems, or related field
  • 3+ years Hardware Engineering experience defining architecture of GPUs or accelerators used for training of AI models
  • In-depth knowledge of nVidia/AMD GPU capabilities and architectures
  • Knowledge of LLM architectures and their HW requirements

Nice-to-haves

  • Knowledge of computer architecture, digital circuits and hardware simulators
  • Knowledge of communication protocols used in AI systems
  • Knowledge of Network-on-Chip (NoC) designs used in System-on-Chip (SoC) designs
  • Understanding of various memory technologies used in AI systems
  • Experience in modeling hardware and workloads to extract performance and power estimates
  • High-level hardware modeling experience preferred
  • Knowledge of AI Training systems such as NVIDIA DGX and NVL72
  • Experience training and finetuning LLMs using distributed training frameworks such as DeepSpeed, FSDP
  • Knowledge of front-end ML frameworks (i.e., TensorFlow, PyTorch) used for training of ML models
  • Strong communication skills (written and verbal)
  • Detail-oriented with strong problem-solving, analytical and debugging skills
  • Demonstrated ability to learn, think and adapt in a fast-changing environment
  • Ability to code in C++ and Python
  • Knowledge of a variety of classes of ML models (i.e. CNN, RNN, etc)

Benefits

  • Competitive annual discretionary bonus program
  • Opportunity for annual RSU grants
  • Comprehensive benefits package designed to support success at work, at home, and at play
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service