This job is closed
We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.
Meta - Menlo Park, CA
posted about 2 months ago
The Software Engineer, SystemML - Scaling / Performance role at Meta involves working within the Network.AI Software team to enhance the software stack around the NVIDIA Collective Communications Library (NCCL). This position focuses on enabling reliable and scalable distributed machine learning (ML) training, particularly for Generative AI (GenAI) and Large Language Models (LLM). The team is responsible for improving the performance and reliability of distributed ML workloads across Meta's extensive GPU infrastructure, contributing to innovations in ML products and applications.
Match and compare your resume to any job description
Start Matching