Nvidia - Santa Clara, CA

posted 25 days ago

Full-time - Senior
Santa Clara, CA
Computer and Electronic Product Manufacturing

About the position

As a Product Architect at NVIDIA, you will play a crucial role in designing and overseeing the development of cutting-edge AI products and solutions. This position focuses on creating reference designs for powerful AI clusters, balancing technical feasibility with user needs and market demands. You will be at the forefront of technology trends in AI infrastructure, contributing to the transformation of ideas into functional products throughout their lifecycle.

Responsibilities

  • Design the next-gen datacenter-scale AI infrastructure using NVIDIA GPUs, compute servers, networking, and storage technologies.
  • Stay updated on technology trends in AI infrastructure, workload orchestration, MLOps platforms, and GenAI applications, proposing new product concepts.
  • Architect detailed product designs, including specifications for functionality, performance, integrations, hardware, and datacenter requirements.
  • Lead prototyping and testing processes to validate product design and functionality with an emphasis on performance and scalability.
  • Iterate and refine designs based on feedback, testing results, and evolving requirements.
  • Coordinate with cross-functional teams to ensure seamless integration of various components and technologies.
  • Work closely with partners and customers to conceptualize ideal solutions, build proof-of-concepts (POCs) and minimum viable products (MVPs).
  • Provide comprehensive documentation for product designs, including architectural diagrams, technical specifications, and user guides.
  • Develop targeted content such as whitepapers, blogs, and curriculum for customers, partners, and users.

Requirements

  • 12+ years of experience designing datacenter scale HPC infrastructure as an Infrastructure Architect, Solutions Architect, Principal Engineer, or similarly technical role.
  • Bachelor's Degree in Computer Science or a related field, or equivalent experience.
  • Strong background in technologies that enable large scale infrastructure, management of complex systems, networks, storage, and datacenter equipment.
  • Familiarity with job schedulers, workload orchestrators, high-performance storage, and complex network topologies.
  • Extensive experience with benchmarking systems and analyzing performance bottlenecks in large-scale AI/HPC infrastructure.
  • Exceptional communication skills, with the ability to translate complex technical details for diverse audiences.
  • A passion for staying updated with the latest technological advancements in AI, HPC and related fields with a focus on hardware and large scale infrastructure innovations.

Nice-to-haves

  • Prior first-hand experience building large scale AI infrastructure.
  • Knowledge of end-to-end AI workloads including training, fine-tuning, inference at scale, agentic & RAG-based workflows, and model evaluation.
  • Advanced certifications or publications in AI, deep learning, or related fields, and experience leading high-impact projects or initiatives in innovative tech domains.
  • Proactive contributions to open-source projects or active involvement in tech communities.
  • Proven ability to mentor and uplift junior team members or peers, and unique problem-solving experience where unconventional thinking led to breakthrough solutions.

Benefits

  • Equity and benefits eligibility based on location and experience.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service