Nvidia - Santa Clara, CA
posted 10 days ago
The Senior Site Reliability Engineer (SRE) at NVIDIA focuses on designing, building, and maintaining large-scale production systems with high efficiency and availability. This role emphasizes the importance of observability and telemetry in ensuring the reliability and uptime of GPU cloud services. SREs at NVIDIA utilize a combination of software and systems engineering practices to automate processes, optimize performance, and enhance system reliability while fostering a culture of diversity and collaboration.