Google - Sunnyvale, CA

posted 3 days ago

Full-time - Mid Level
Sunnyvale, CA
Web Search Portals, Libraries, Archives, and Other Information Services

About the position

The Machine Learning III GPU Performance Engineer at Google is responsible for optimizing GPU performance for large language models (LLMs) and ensuring efficient training and serving of these models on a massive scale. This role involves collaborating with product teams to address performance issues, conducting architecture simulations, and analyzing performance metrics to enhance system efficiency.

Responsibilities

  • Identify and maintain Large Language Model (LLM) training and serving benchmarks that are representative of Google production and the ML community.
  • Engage with Google product teams to solve ML model performance problems and onboard new LLM models on GPU hardware.
  • Run architecture level simulations on GPU designs and perform roof line analysis to guide internal teams.
  • Run performance benchmarks on GPU hardware using internal and external tools.
  • Analyze performance and efficiency metrics to identify bottlenecks and design solutions at Google fleetwide scale.

Requirements

  • Bachelor's degree or equivalent practical experience.
  • 2 years of experience in software development (e.g., C++, Python) and with data structures/algorithms.
  • 1 year of experience testing, maintaining, or launching software products.
  • 1 year of experience with software design and architecture.
  • 1 year of experience with performance, systems data analysis, visualization tools, or debugging.

Nice-to-haves

  • Master's degree or PhD in Computer Science or related technical field or equivalent practical experience.
  • Experience in optimizing GPU-accelerated environments with an understanding of large language models (LLMs) and training/inference pipelines.
  • Proven ability to analyze and optimize GPU performance for computational tasks, including benchmarking, profiling, and identifying bottlenecks.

Benefits

  • Health insurance
  • 401k
  • Paid holidays
  • Flexible scheduling
  • Professional development
  • Gym membership
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service