Nvidia - Santa Clara, CA
posted 2 months ago
As a Site Reliability Engineer at NVIDIA, you will lead the design and implementation of cutting-edge GPU compute clusters that support AI research. This role focuses on building and operating these clusters with high reliability, efficiency, and performance, while driving automation and foundational improvements to enhance researcher productivity. You will be part of a diverse team that values intellectual curiosity and problem-solving, working in a collaborative environment that encourages innovation and self-direction.