This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Nvidia - Santa Clara, CA

posted 2 months ago

Full-time - Senior
Santa Clara, CA
Computer and Electronic Product Manufacturing

About the position

The Deep Learning Engineer position focuses on developing distributed backends for leading Deep Learning frameworks such as PyTorch, JAX, and TensorFlow. The role involves building on validated task-based runtime systems to create a scalable platform for various model architectures, enabling performance improvements and debugging tools for AI models at large scales.

Responsibilities

  • Develop extensions to popular Deep Learning frameworks for experimentation with parallelization strategies.
  • Create compiler optimizations and parallelization heuristics to enhance AI model performance at scale.
  • Develop tools for performance debugging of AI models in large-scale environments.
  • Study and tune Deep Learning training workloads, including enterprise and academic models.
  • Support enterprise customers and partners in scaling novel models using the platform.
  • Collaborate with software and hardware teams to advance Deep Learning libraries.
  • Contribute to the development of runtime systems foundational to distributed GPU computing.

Requirements

  • BS, MS or PhD degree in Computer Science, Electrical Engineering or related field (or equivalent experience).
  • 5+ years of relevant industry experience or equivalent academic experience after BS.
  • Proficient in Python and C++ programming.
  • Strong background in parallel and distributed programming, preferably on GPUs.
  • Hands-on development experience with Machine Learning frameworks (e.g., PyTorch, TensorFlow, Jax, MXNet, scikit-learn).
  • Understanding of Deep Learning training in distributed contexts (multi-GPU, multi-node).

Nice-to-haves

  • Experience with deep-learning compiler stacks such as XLA, MLIR, Torch Dynamo.
  • Background in performance analysis, profiling, and tuning of HPC/AI workloads.
  • Experience with CUDA programming and GPU performance optimization.
  • Familiarity with tasking or asynchronous runtimes, especially data-centric initiatives like Legion.
  • Experience building, debugging, profiling, and optimizing multi-node applications on supercomputers or the cloud.

Benefits

  • Equity options
  • Comprehensive health benefits
  • Ongoing professional development opportunities
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service