We are seeking an experienced Engineering Manager to join our Inference Scalability and Capability team. This team is responsible for building and maintaining the critical systems that serve our LLMs to a diverse set of consumers. As the cornerstone of our service delivery, the team focuses on scaling inference systems, ensuring reliability, optimizing compute resource efficiency, and developing new inference capabilities. The team tackles complex distributed systems challenges across our entire inference stack, from optimal request routing to efficient prompt caching.
A Smarter and Faster Way to Build Your Resume