This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Nvidia - Santa Clara, CA

posted about 2 months ago

Full-time - Senior
Santa Clara, CA
Computer and Electronic Product Manufacturing

About the position

The Senior Deep Learning Performance Architect at NVIDIA is responsible for developing innovative architectures that enhance deep learning performance and efficiency. This role involves analyzing performance, cost, and power trade-offs, and understanding the interaction between hardware and software architectures to optimize future algorithms and applications. The architect will collaborate with various teams to guide the development of deep learning hardware and software solutions.

Responsibilities

  • Develop innovative architectures to extend the state of the art in deep learning performance and efficiency.
  • Analyze performance, cost, and power trade-offs by developing analytical models, simulators, and test suites.
  • Understand and analyze the interplay of hardware and software architectures on future algorithms, programming models, and applications.
  • Develop, analyze, and harness groundbreaking Deep Learning frameworks, libraries, and compilers.
  • Actively collaborate with software, product, and research teams to guide the direction of deep learning hardware and software.

Requirements

  • MS or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience.
  • 6+ years of meaningful work experience.
  • Strong background in GPU or Deep Learning ASIC architecture for training and/or inference.
  • Experience with performance modeling, architecture simulation, profiling, and analysis.
  • Solid foundation in machine learning and deep learning.
  • Strong programming skills in Python, C, C++.

Nice-to-haves

  • Background with deep neural network training, inference, and optimization in leading frameworks (e.g. Pytorch, JAX, TensorRT).
  • Experience with relevant libraries, compilers, and languages - CUDNN, CUBLAS, CUTLASS, MLIR, Triton, CUDA, OpenCL.
  • Experience with the architecture of or workload analysis on other DL accelerators.
  • Demonstration of self-motivation, with a knack for critical thinking and thinking outside the box.

Benefits

  • Equity options
  • Comprehensive health benefits
  • Diversity and inclusion programs
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service