Meta - Seattle, WA
posted 4 days ago
The Software Engineer, SystemML - AI Networking role involves working within the AI Networking Software team at Meta, focusing on developing and enhancing the software stack around the NVIDIA Collective Communications Library (NCCL). This position is critical for enabling reliable and scalable distributed machine learning (ML) training on Meta's large-scale GPU infrastructure, particularly for Generative AI (GenAI) and Large Language Models (LLM). The team aims to improve the performance and reliability of distributed ML workloads, ensuring that Meta's ML products can leverage extensive GPU resources effectively.