Annapurna Labs (U.S.) Inc. - D63 - Seattle, WA
posted 3 months ago
AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators and the Trn1 and Inf1 servers that use them. This role is for a software engineer in the Machine Learning Applications (ML Apps) team for AWS Neuron. The primary responsibility of this position is the development, enablement, and performance tuning of a wide variety of machine learning model families, including massive scale large language models like Llama2, GPT2, GPT3, and beyond, as well as stable diffusion, Vision Transformers, and many more. The ML Apps team collaborates closely with compiler engineers and runtime engineers to create, build, and tune distributed inference solutions with Trn1. Experience optimizing inference performance for both latency and throughput on these large models using Python, Pytorch, or JAX is essential. The role will involve using Deepspeed and other distributed inference libraries, with a focus on extending these capabilities for the Neuron-based system. The successful candidate will help lead efforts to build distributed inference support into Pytorch and Tensorflow using XLA and the Neuron compiler and runtime stacks. Tuning these models to ensure the highest performance and maximizing their efficiency on the customer AWS Trainium and Inferentia silicon and the TRn1 and Inf1 servers is a critical aspect of this role. In a typical day, you will design and code solutions to drive efficiencies in software architecture, create metrics, implement automation, and resolve the root causes of software defects. You will also build high-impact solutions for a large customer base, participate in design discussions, conduct code reviews, and communicate with both internal and external stakeholders. The work environment is dynamic and startup-like, where you will always be focused on the most important tasks. The team is dedicated to supporting new members, fostering an environment that celebrates knowledge-sharing and mentorship, and ensuring that team members feel empowered to take on more complex tasks in the future.