Nvidia - Santa Clara, CA
posted 12 days ago
The Senior Site Reliability Engineer (SRE) at NVIDIA is responsible for designing, building, and maintaining large-scale production systems with a focus on high efficiency and availability. This role involves utilizing software and systems engineering practices to ensure maximum reliability and uptime of GPU cloud services, while enabling developers to implement changes through careful planning. The SRE position emphasizes automation, performance tuning, and the optimization of production systems, fostering a culture of diversity, intellectual curiosity, and problem-solving.