Nvidia - Santa Clara, CA
posted 5 months ago
NVIDIA's invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company, and form teams with the most inquisitive people in the world. Join us at the forefront of technological advancement. Are you ready to change the next generation of computing? Join us at the forefront of technological advancement. We are looking for a principal platform software architect who can lead next generation data center server product platform architecture, bring up and drive a solution to production. In this role, you will be responsible for platform architecture and hardware bring up of NVIDIA HGX GPU baseboards. You will engage in software architecture and design for various firmware, understanding embedded system limitations, and Linux kernel internals to ensure performance, scalability, and resiliency requirements for firmware running on embedded devices. You will work closely with hardware teams to influence hardware design and review HW architecture & schematics. Additionally, you will collaborate with internal and external team members to narrow down on performance and resiliency requirements for firmware running on Nvidia data center products. Your responsibilities will also include hands-on coding, code review, and BMC firmware development including various manageability features for NVIDIA's Server platforms. You will actively engage in designing and developing CI/CD framework to ensure best quality for firmware, writing and reviewing design documents, and reviewing QA test plans while working closely with all collaborators to achieve consensus for design and testability as per product requirements. You will design solutions for errors, stats & configuration appropriate to CPU, GPU, DIMM, SSDs, NICs, IB, PSU, BMC, FPGA, CPLD etc. for enterprise readiness of NVIDIA Server platforms. Furthermore, you will work with the security team to ensure developed code aligns with product security goals, and mentor your team on best practices for writing efficient and bug-free code.