This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Sr. Software Development Engineer, HPC/ML Networking Engineer, Annapurna Labs

Amazon.composted 4 days ago

$151,300 - $261,500/Yr

Full-time • Senior

Cupertino, CA

General Merchandise Retailers

About the position

We are seeking an experienced software engineer with low-level latency networking or interconnect expertise to optimize customer experience by designing systems that enable scaling network-intensive workloads over thousands of CPUs, GPUs, and TPUs. This role is on the forefront of AI/ML, we spend a good deal of the day optimizing the networking for the latest AI workload such as LLMs. Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago-even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world. Our ideal candidate will have extensive experience in low-latency networking and collective operations, such as HPC network fabric or machine learning accelerator cluster systems. Also applicable is experience high-frequency trading networking, high-speed wireless networking, or low latency interconnects such as PCIe or CXL. Proficiency in C/C++ and a deep understanding of Linux and kernel-level programming are essential. Strong problem-solving skills and the ability to troubleshoot complex networking issues are required, along with excellent communication skills to work effectively in a collaborative team environment. If you like solving hard infrastructure problems, want to work with HPC and ML customers, iterate fast and deliver meaningful solutions at scale, then come join us!

Responsibilities

Design and optimize networking solutions for Machine Learning (ML) and High-Performance Computing (HPC) workloads on AWS.
Collaborate with cross-functional teams and engage with customers to gather feedback and continuously improve offerings.
Participate in innovative learning experiences and benefit offerings.
Hunt for performance bottlenecks and optimize customers' heavy ML/AI workloads.
Mentor junior engineers and participate in code reviews.

Requirements

5+ years of non-internship professional software development experience.
5+ years of programming with at least one software programming language experience.
5+ years of leading design or architecture of new and existing systems experience.
5+ years of full software development life cycle experience.
Experience as a mentor, tech lead or leading an engineering team.