This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Deep Learning Engineer - Distributed Task-Based Backends

Nvidiaposted 6 months ago

$148,000 - $276,000/Yr

Full-time • Senior

Santa Clara, CA

Computer and Electronic Product Manufacturing

About the position

The Deep Learning Engineer position focuses on developing distributed backends for leading Deep Learning frameworks such as PyTorch, JAX, and TensorFlow. The role involves building on validated task-based runtime systems to create a scalable platform for various model architectures, enabling performance improvements and debugging tools for AI models at large scales.

Responsibilities

Develop extensions to popular Deep Learning frameworks for experimentation with parallelization strategies.
Create compiler optimizations and parallelization heuristics to enhance AI model performance at scale.
Develop tools for performance debugging of AI models in large-scale environments.
Study and tune Deep Learning training workloads, including enterprise and academic models.
Support enterprise customers and partners in scaling novel models using the platform.
Collaborate with software and hardware teams to advance Deep Learning libraries.
Contribute to the development of runtime systems foundational to distributed GPU computing.

Requirements

BS, MS or PhD degree in Computer Science, Electrical Engineering or related field (or equivalent experience).
5+ years of relevant industry experience or equivalent academic experience after BS.
Proficient in Python and C++ programming.
Strong background in parallel and distributed programming, preferably on GPUs.
Hands-on development experience with Machine Learning frameworks (e.g., PyTorch, TensorFlow, Jax, MXNet, scikit-learn).
Understanding of Deep Learning training in distributed contexts (multi-GPU, multi-node).

Nice-to-haves

Experience with deep-learning compiler stacks such as XLA, MLIR, Torch Dynamo.
Background in performance analysis, profiling, and tuning of HPC/AI workloads.
Experience with CUDA programming and GPU performance optimization.
Familiarity with tasking or asynchronous runtimes, especially data-centric initiatives like Legion.
Experience building, debugging, profiling, and optimizing multi-node applications on supercomputers or the cloud.

Benefits

Equity options
Comprehensive health benefits
Ongoing professional development opportunities

A Smarter and Faster Way to Build Your Resume

Go to AI Resume Builder

Deep Learning Engineer - Distributed Task-Based Backends

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company