Nvidia - Santa Clara, CA

posted about 1 month ago

Full-time - Senior
Santa Clara, CA
Computer and Electronic Product Manufacturing

About the position

The Senior Software Architect for Data Center Systems at NVIDIA will lead the design and development of innovative server systems tailored for GPU-accelerated applications, particularly in Deep Learning. This role involves collaborating with various teams to create a comprehensive software and firmware stack, ensuring that the systems meet customer needs and align with NVIDIA's strategic roadmap.

Responsibilities

  • Lead software activities for NVIDIA's deep learning server platforms from design through production.
  • Drive the system architecture for a complex server platform in a multi-functional environment.
  • Partner with application software, libraries, system software, and firmware teams to design complete software solutions for new server platforms.
  • Work directly with major customers to understand their requirements and align their roadmap with NVIDIA's roadmap.
  • Collaborate with business partners and vendors to shape their products to meet NVIDIA's needs.
  • Develop a roadmap of new technologies and protocols and drive their design and adoption.
  • Mentor architects and engineering teams to grow them into future leaders.
  • Make key technical decisions for designs involving complex inter-component dependencies.

Requirements

  • Deep experience in designing architecture for scalable and performant server systems, particularly at the SW/HW interface.
  • Understanding of HPC or Deep learning workloads and use of accelerated computing platforms.
  • Expertise in Out of Band and In-band management architectures.
  • Knowledge of server system architecture and implications of architecture decisions on overall performance of end applications.
  • Demonstrable experience in implementing left shift strategy to de-risk program execution.
  • Excellent written and verbal communication skills.
  • BS or MS degree in Computer Engineering, Computer Science, or related degree or equivalent experience.
  • 10+ years in the area of System architecture and design.

Nice-to-haves

  • Knowledge of cloud and cluster level deployment and management systems.
  • Strong background of device management protocols such as Redfish, IPMI, MCTP, PLDM and RDE.
  • Knowledge in storage and networking technologies.

Benefits

  • Equity options
  • Comprehensive health benefits
  • Flexible work hours
  • Paid time off
  • Retirement savings plan
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service