Nvidia - Seattle, WA

posted 2 months ago

Full-time - Senior
Seattle, WA
Computer and Electronic Product Manufacturing

About the position

NVIDIA is seeking a Senior Machine Learning Engineer for Quantized Training to support next-generation recipes for mixed-precision training. In this role, you will be responsible for distilling large language model (LLM) research literature into its core components, translating that literature into scalable experiments, creating insights to support or refute the efficacy of various techniques, and generating reproducible training recipes. This position requires a deep understanding of the latest advancements in quantized training and the ability to apply this knowledge in practical settings. Your responsibilities will include reviewing state-of-the-art literature in quantized training, building robust, reproducible, and portable training recipes, and providing engineering support to customers using both hardware and software approaches. You will collaborate closely with hardware, software, and research teams to assess and adopt deep learning algorithmic advancements in quantization. Additionally, you will work with production software teams to implement these recipes into production workflows, ensuring that they are effective and efficient. This role is critical in shaping the future of AI at NVIDIA, as you will be at the forefront of integrating and optimizing deep learning frameworks on the most advanced GPUs. You will have the opportunity to influence the long-term opportunities that expand NVIDIA's impact on the datacenter and beyond, all while working in a creative and autonomous environment that encourages innovation.

Responsibilities

  • Review state-of-the-art literature in quantized training
  • Build robust, reproducible, and portable training recipes
  • Provide engineering support to customers using hardware and software approaches
  • Collaborate closely with hardware, software, and research teams to assess and adopt deep learning algorithmic advancements in quantization
  • Work with production software teams to realize recipes in production workflows

Requirements

  • Experience with PyTorch or similar frameworks such as JAX/XLA
  • Proficient in the math of machine learning
  • Familiarity with FP8 for training
  • Published research or significant contributions to the field of AI, particularly in algorithm development for hardware-software co-design
  • PhD, M.S. degree or equivalent experience in Computer Science or a related field
  • 5+ years of experience working in ML/AI
  • Strong written and oral communication skills
  • Strong programming skills and ability to debug ML systems

Nice-to-haves

  • Experience in LLM training, fine-tuning, and optimization (quantization, sparsity)
  • Familiarity with MX formats for training
  • Experience with Transformer Engine, Megatron-LM, or NeMo

Benefits

  • Equity
  • Comprehensive health benefits
  • Flexible work environment
  • Opportunities for professional development
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service