Nvidia - Santa Clara, CA

posted about 1 month ago

Full-time - Senior
Santa Clara, CA
Computer and Electronic Product Manufacturing

About the position

The Software Architect for Data Center Platform Simulation and Virtualization at NVIDIA will be responsible for designing and owning the system architecture of simulators for DGX and HGX Server platforms. This role is crucial in building scalable systems that leverage NVIDIA's advanced technologies, including GPUs, NVLink, and InfiniBand networking. The architect will collaborate with engineering teams and cloud service providers to ensure the successful market introduction of these products.

Responsibilities

  • Drive requirements, architecture, and roadmap of NVIDIA DGX Simulation platforms.
  • Engage with major customers to understand their requirements and align with their roadmap and adoption strategy.
  • Work closely with hardware modeling, kernel & platform driver teams distributed globally.
  • Build and deliver full server simulation platform to internal and external NVIDIA partners.
  • Mentor architects and engineering teams to grow them into future leaders.
  • Make key technical decisions even when faced with ambiguity and mitigate execution risks by following left shift strategy.

Requirements

  • BS degree or higher in Computer Science or related field or equivalent experience.
  • 10+ years of relevant experience in virtualization and HW simulation/emulation technologies.
  • Proven experience in designing architecture for scalable and performant server systems, particularly at the SW/HW interface.
  • Previous experience around hardware interfaces such as PCIe, SPI, I3C with Linux boot solutions on x86 & ARM class platforms.
  • Good understanding of hypervisors & HW emulators, like Qemu, KVM, VDK, SIMICs.
  • Experience in Out of Band and Inband management architectures.
  • Proficient in C / C++ with strong software development, optimization, user & kernel mode debugging skills.
  • Strong interpersonal & communication skills to work with a globally distributed engineering team.

Nice-to-haves

  • Experience in building left shift strategy around HW & SW stack bringup using Simulators & Emulators.
  • Contribution in Qemu/KVM opensource repositories.
  • Experience in Verilog and SystemC.
  • Knowledge of device management protocols such as MCTP, PLDM and RDE.
  • Knowledge of system management protocols such as Redfish and IPMI.

Benefits

  • Equity options
  • Comprehensive health benefits
  • Flexible work hours
  • Diversity and inclusion programs
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service