Nvidia - Santa Clara, CA

posted 25 days ago

Full-time - Senior
Santa Clara, CA
Computer and Electronic Product Manufacturing

About the position

NVIDIA is seeking highly motivated engineers to enhance its AI Infrastructure, focusing on the architecture, design, and implementation of next-generation DGX cloud clusters. This role involves full-stack deployment, including hardware architecture, workload orchestration, and application performance tuning, aimed at advancing NVIDIA's capabilities in AI-based applications and data science.

Responsibilities

  • Lead technical activities for data centers with a focus on hybrid deployments between cloud and on-prem.
  • Provide expertise in infrastructure workflows, including hardware, workload orchestration, and application tuning.
  • Provide fast and creative solutions for complex problems and write effective, clear, and reliable architecture specifications.
  • Translate requirements into vision, architecture, and roadmap.
  • Collaborate with engineering teams across NVIDIA to ensure seamless integration of software from hardware to AI training applications.

Requirements

  • Masters or PhD in Computer Science, Computer Engineering, Physics, or equivalent experience.
  • 10+ years of experience in the field.
  • Data Sciences, Deep Learning, or Machine Learning coursework.
  • Ability to seamlessly shift between Linux system environments to Python programming.
  • Programming skills in one or more high-level languages (C, C++, Go, Rust, etc.).
  • System-level experience with both hardware and software.
  • Strong problem-solving skills and customer-facing communication skills.
  • Strong design, coding, analytical, debugging, and problem-solving skills.
  • Passion for continuous learning and knowledge transfer.
  • Ability to work concurrently with multiple groups locally and abroad.

Nice-to-haves

  • Experience with GPU deep learning and data sciences.
  • Experience using TensorFlow, PyTorch, or other deep learning frameworks.
  • Experience working with Docker containers, Slurm, Terraform, and Kubernetes.
  • CUDA programming and NCCL experience.
  • HPC programming experience including MPI, OpenACC, or other parallel programming tools.
  • Hands-on experience with DGX Cloud, NVIDIA AI Enterprise AI Software, Base Command Manager, NEMO, and NVIDIA Inference Microservices.

Benefits

  • Equity options
  • Comprehensive health benefits
  • Flexible work hours
  • Diversity and inclusion programs
  • Ongoing learning and development opportunities
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service