Nvidia - Durham, NC
posted 3 months ago
As an SRE focused on metrics reporting at Nvidia, you will play a crucial role in collaborating with cross-functional teams, including software engineers, data scientists, and operations personnel. Your primary responsibility will be to monitor, analyze, and optimize our systems by collecting, analyzing, and presenting key performance indicators (KPIs) that drive operational excellence and inform strategic decisions. This position is integral to enhancing the use of our AI/ML and chip development infrastructure, ensuring that our engineering teams can develop at an unprecedented speed. In this role, you will be involved in the full life-cycle of tool development, from testing to deployment. You will work within a diverse team to provide operational and strategic metrics that empower engineers to improve productivity and efficiency. A significant aspect of your work will be to continuously enhance our chip development process through better observability, directly contributing to the overall quality and reducing the time to market for our next-generation chips. Your contributions will not only impact the immediate team but will also play a part in Nvidia's broader mission to amplify human creativity and intelligence through innovative technology. This is an opportunity to be part of a company that is at the forefront of AI and accelerated computing, tackling challenges that matter to the world.