Annapurna Labs (U.S.) Inc. - D63 - Seattle, WA

posted 3 months ago

Full-time - Mid Level
Seattle, WA

About the position

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators and the Trn1 and Inf1 servers that use them. This role is for a software engineer in the Machine Learning Applications (ML Apps) team for AWS Neuron. The primary responsibility of this position is the development, enablement, and performance tuning of a wide variety of machine learning model families, including massive scale large language models like Llama2, GPT2, GPT3, and beyond, as well as stable diffusion, Vision Transformers, and many more. The ML Apps team collaborates closely with compiler engineers and runtime engineers to create, build, and tune distributed inference solutions with Trn1. Experience optimizing inference performance for both latency and throughput on these large models using Python, Pytorch, or JAX is essential. The role will involve using Deepspeed and other distributed inference libraries, with a focus on extending these capabilities for the Neuron-based system. The successful candidate will help lead efforts to build distributed inference support into Pytorch and Tensorflow using XLA and the Neuron compiler and runtime stacks. Tuning these models to ensure the highest performance and maximizing their efficiency on the customer AWS Trainium and Inferentia silicon and the TRn1 and Inf1 servers is a critical aspect of this role. In a typical day, you will design and code solutions to drive efficiencies in software architecture, create metrics, implement automation, and resolve the root causes of software defects. You will also build high-impact solutions for a large customer base, participate in design discussions, conduct code reviews, and communicate with both internal and external stakeholders. The work environment is dynamic and startup-like, where you will always be focused on the most important tasks. The team is dedicated to supporting new members, fostering an environment that celebrates knowledge-sharing and mentorship, and ensuring that team members feel empowered to take on more complex tasks in the future.

Responsibilities

  • Develop and enable performance tuning of various ML model families including large language models and Vision Transformers.
  • Optimize inference performance for latency and throughput using Python, Pytorch, or JAX.
  • Lead efforts to build distributed inference support into Pytorch and Tensorflow using XLA and the Neuron compiler.
  • Tune models for maximum efficiency on AWS Trainium and Inferentia silicon and TRn1, Inf1 servers.
  • Design and code solutions to improve software architecture efficiencies.
  • Create metrics and implement automation to enhance processes.
  • Resolve root causes of software defects and improve overall software quality.
  • Participate in design discussions and code reviews, providing technical input to drive business decisions.

Requirements

  • 3+ years of non-internship professional software development experience.
  • 2+ years of non-internship design or architecture experience of new and existing systems.
  • Experience programming with at least one software programming language.

Nice-to-haves

  • 3+ years of full software development life cycle experience, including coding standards, code reviews, source control management, build processes, testing, and operations.
  • Bachelor's degree in computer science or equivalent.

Benefits

  • Competitive salary based on market location and job-related knowledge, skills, and experience.
  • Equity and sign-on payments as part of total compensation package.
  • Full range of medical, financial, and other benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service