Meta - Bellevue, WA
posted 5 months ago
Meta is seeking a Research Scientist to join our Research & Development teams, focusing on Systems Machine Learning (ML) with an emphasis on software/hardware co-design for inference. The ideal candidate will possess industry experience in AI Infrastructure and will be tasked with applying their skills to tackle some of the most critical and exciting challenges in the web domain. This position is available in multiple locations, reflecting Meta's commitment to innovation and excellence in technology. The Kernel team is dedicated to maximizing inference performance for Generative AI and Recommendation models by developing high-performance kernels. Our expertise lies in creating specialized kernels that significantly enhance the efficiency of these models. Notably, we have successfully developed and deployed the first FP8 kernel in Meta's production environment, along with FBGEMM TBE. By continuously advancing our kernel optimization capabilities, we aim to improve user experiences and drive innovation in Generative AI and Recommendation systems. The E2E Performance team focuses on optimizing the end-to-end performance of Generative AI and Recommendation models. We utilize various parallelism strategies and distributed inference techniques to enhance time-to-interaction (TTIT) and time-to-first-token (TTFT) for large language models (LLM) and latent diffusion models (LDM). Our relentless pursuit of performance improvements has led to significant achievements, such as enabling the use of AMD GPUs for GenAI production applications and optimizing their performance. Our ongoing efforts are geared towards the continuous enhancement of these models' performance, ultimately providing users with more responsive and seamless interactions with Generative AI.