Apple - Cupertino, CA

posted 4 days ago

Full-time - Mid Level
Cupertino, CA
Computer and Electronic Product Manufacturing

About the position

The role involves optimizing end-to-end system performance for distributed machine learning workloads within Apple's Machine Learning Platform Technology & Infra team. The position requires collaboration with machine learning researchers and key partners across the company to enhance the efficiency and resiliency of large-scale ML applications.

Responsibilities

  • Engage with ML researchers to optimize end-to-end performance of large scale distributed ML workloads
  • Analyze workload metrics to identify sources of inefficiencies and work with users to understand and optimize ML workloads
  • Conduct workload analysis based on benchmarking key workloads on deployed systems
  • Improve large scale training resiliency by optimizing applications and frameworks for improved recovery from failures and preemptions
  • Influence architecture, design, development, and operations of next generation ML accelerator systems based on workload insights

Requirements

  • Experience working with large scale parallel and distributed accelerator-based systems
  • Experience optimizing performance and AI workloads at scale
  • Experience developing code in one or more of training frameworks (such as PyTorch, TensorFlow or JAX)
  • Strong communicator with ability to analyze complex and ambiguous problems
  • Programming and software design skills (proficiency in C/C++ and/or Python)
  • Experience working in a high-level collaborative environment and promoting a teamwork mentality
  • Bachelor's degree in Computer Science and 7+ years of work experience

Nice-to-haves

  • Deep understanding of computer systems and the interactions between HW and SW
  • Experience in performance analysis and optimization experience in Cloud accelerators
  • Advanced degree in CS

Benefits

  • Comprehensive medical and dental coverage
  • Retirement benefits
  • Discounted products and free services
  • Reimbursement for certain educational expenses including tuition
  • Discretionary bonuses or commission payments
  • Relocation assistance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service