Nvidia - Santa Clara, CA

posted 6 months ago

Full-time - Senior
Santa Clara, CA
1,001-5,000 employees
Computer and Electronic Product Manufacturing

About the position

As a Senior Datacenter GPU Power Architect at NVIDIA, you will play a pivotal role in shaping the future of our datacenter products. Your primary responsibility will be to set power targets for upcoming projects, meticulously track ASIC milestones, and drive Performance vs Power Analysis. This analysis is crucial for making informed decisions that will impact our product roadmap significantly. You will leverage machine learning techniques to create highly accurate power and performance models for our GPUs, CPUs, switches, and platforms, ensuring that we remain at the forefront of technology in the industry. In this role, you will need to understand the workload characteristics for Generative AI (GenAI) and High-Performance Computing (HPC) workloads at a datacenter scale. This understanding will enable you to drive the development of new hardware and software features aimed at improving performance per watt (Perf@Watt). You will also be involved in pre-silicon thermal analysis for next-generation GPUs, which includes designing thermal use cases, creating power maps for thermal simulations, and driving architectural features that mitigate thermal issues. Your work will involve modeling and analyzing cutting-edge technologies such as chiplets and high-speed, high-density interconnects. Collaboration is key in this position; you will work closely with cross-functional teams that include architects, designers, VLSI engineers, software developers, management, and marketing teams to influence the product roadmap effectively. This role is not just about technical expertise; it also requires strong interpersonal and organizational skills to thrive in a team-oriented environment.

Responsibilities

  • Set power targets for future projects.
  • Track ASIC milestones and drive Performance vs Power Analysis.
  • Deploy machine learning techniques to develop power and performance models of GPUs, CPUs, switches, and platforms.
  • Understand workload characteristics for GenAI/HPC workloads at datacenter scale.
  • Drive new HW/SW features for Perf@Watt improvements.
  • Conduct pre-silicon thermal analysis for next-gen GPUs.
  • Design thermal use cases and power maps for thermal simulations.
  • Drive thermal mitigation architectural features.
  • Model and analyze technologies like chiplets and high-speed interconnects.
  • Collaborate with cross-functional teams including Architects, Designers, VLSI, Software, Management, and Marketing.

Requirements

  • MSEE/MSCE, preferably PhD, or equivalent experience related to Power / Performance estimation and optimization techniques.
  • 5+ years of experience in relevant fields.
  • Strong knowledge of energy-efficient chip/system design fundamentals and related tradeoffs.
  • Familiarity with low power design techniques such as multi-VT, clock gating, power gating, and Dynamic Voltage-Frequency Scaling (DVFS).
  • Deeper understanding of processors (GPU is a plus), servers, and system-SW architectures, and their performance/power modeling techniques.
  • Familiarity with Python and Si data analysis.
  • Familiarity with performance monitors/simulators used in modern processor architectures.
  • Strong interpersonal and organizational skills.

Nice-to-haves

  • Experience with machine learning techniques in power modeling.
  • Knowledge of advanced thermal management techniques.
  • Experience in high-performance computing environments.

Benefits

  • Equity options
  • Comprehensive health benefits
  • Flexible work hours
  • Opportunities for professional development
  • Diversity and inclusion programs
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service