Nvidia - Santa Clara, CA

posted 5 months ago

Full-time - Principal
Santa Clara, CA
5,001-10,000 employees
Computer and Electronic Product Manufacturing

About the position

NVIDIA's invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company, and form teams with the most inquisitive people in the world. Join us at the forefront of technological advancement. Are you ready to change the next generation of computing? Join us at the forefront of technological advancement. We are looking for a principal platform software architect who can lead next generation data center server product platform architecture, bring up and drive a solution to production. In this role, you will be responsible for platform architecture and hardware bring up of NVIDIA HGX GPU baseboards. You will engage in software architecture and design for various firmware, understanding embedded system limitations, and Linux kernel internals to ensure performance, scalability, and resiliency requirements for firmware running on embedded devices. You will work closely with hardware teams to influence hardware design and review HW architecture & schematics. Additionally, you will collaborate with internal and external team members to narrow down on performance and resiliency requirements for firmware running on Nvidia data center products. Your responsibilities will also include hands-on coding, code review, and BMC firmware development including various manageability features for NVIDIA's Server platforms. You will actively engage in designing and developing CI/CD framework to ensure best quality for firmware, writing and reviewing design documents, and reviewing QA test plans while working closely with all collaborators to achieve consensus for design and testability as per product requirements. You will design solutions for errors, stats & configuration appropriate to CPU, GPU, DIMM, SSDs, NICs, IB, PSU, BMC, FPGA, CPLD etc. for enterprise readiness of NVIDIA Server platforms. Furthermore, you will work with the security team to ensure developed code aligns with product security goals, and mentor your team on best practices for writing efficient and bug-free code.

Responsibilities

  • Lead next generation data center server product platform architecture.
  • Engage in platform architecture and hardware bring up of NVIDIA HGX GPU baseboards.
  • Design and develop software architecture for various firmware.
  • Understand embedded system limitations and Linux kernel internals.
  • Influence hardware design and review HW architecture & schematics.
  • Collaborate with internal and external teams to define performance and resiliency requirements.
  • Conduct hands-on coding, code review, and BMC firmware development.
  • Design and develop CI/CD framework for firmware quality assurance.
  • Write and review design documents and QA test plans.
  • Design solutions for errors, stats & configuration for enterprise readiness.
  • Instrument code to ensure maximum code coverage and automate unit tests.
  • Mentor team on best practices for writing efficient and bug-free code.
  • Work with the security team to ensure code aligns with product security goals.

Requirements

  • Bachelor of Science Degree (or higher) or equivalent experience in Electrical or Computer Engineering or Computer Science.
  • 15+ years of active development using C / C++ as primary programming language in a Linux environment.
  • 8+ years of experience in technically leading a team in delivering large firmware or software projects.
  • 5+ years of experience working with internal and external stakeholders to define requirements and convert them into architecture.
  • Proven track record of delivering solutions to customers with a deep understanding of deployments at scale.
  • Domain expertise in Data Center Firmware/software development on X86 or ARM Platforms.
  • Board Bring-up expertise with hands-on experience in Device drivers like I2C/I3C, SPI, PCIe, SMBus, Mail-box etc.
  • Understanding of REST architecture style, especially JSON over HTTPs with OAuth.
  • Strong programming skills in C/C++ in a Linux operating environment and strong understanding of Linux kernel internals.
  • Excellent written and oral communication skills, good work ethics, and a high sense of teamwork.

Nice-to-haves

  • Consistent track record in delivering 100,000+ lines of code for a single project.
  • Proven record in technically leading an organization of 30+ engineers.
  • Expertise in system software and platform security for x86/ARM based Rack/Blade server systems.

Benefits

  • Health insurance coverage
  • 401k plan with matching contributions
  • Equity options
  • Maternity and paternity leave
  • Unlimited time off depending on workloads and project timing
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service