This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Staff Software Engineer, Machine Learning Performance, Tensor Processing Unit

Googleposted 18 days ago

$197,000 - $291,000/Yr

Full-time

Mountain View, CA

Web Search Portals, Libraries, Archives, and Other Information Services

About the position

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google's needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward. In this role, you will be responsible for the performance and extracting maximum efficiency for machine learning and AI workloads. You will drive Google ML performance to state-of-the-art using fleet-scale and benchmark analysis and out-of-the-box auto-optimizations. The ML, Systems, & Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world. We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud's Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

Responsibilities

Identify and maintain Large Language Model (LLM) training and serving benchmarks, used by industry and Machine Learning (ML) community to identify performance opportunities and drive TensorFlow/JAX TPU performance.
Work on scaling numeric and algorithmic optimizations to Google products and ML models including quantization, sparsity, and other model compression techniques, new ML model architecture/optimizer/training techniques to solve ML tasks more efficiently.
Engage with Google product teams to solve their Large Language Model (LLM) performance problems including onboarding new LLM models and products on Google new TPU hardware, enabling LLMs to train efficiently on thousands of TPUs.
Analyze performance and efficiency metrics to identify bottlenecks. Design, and implement solutions at Google.

Requirements

Bachelor's degree or equivalent practical experience.
8 years of experience in testing, and launching software products.
5 years of experience with software development in one or more programming languages (e.g., Python, C, C++).
Experience in performance analysis and optimization including system architecture, performance modeling, benchmarking or machine learning infrastructure.

Nice-to-haves

Master's degree or PhD in Engineering, Computer Science, or a related technical field.
3 years of experience in a matrixed organization including technical leadership role leading project teams and setting technical direction.
Experience in compiler optimizations or related fields.
Experience in Machine Learning System (e.g., Background Theory, TensorFlow, etc.).