Microsoft - Mountain View, CA

posted 10 days ago

Full-time - Mid Level
Onsite - Mountain View, CA
Publishing Industries

About the position

The Senior Technical Program Manager for Copilot will play a crucial role in managing the end-to-end development and deployment of infrastructure supporting AI model training and inferencing. This position requires a blend of technical expertise and effective people management skills, focusing on delivering high-performance AI solutions in a fast-paced environment. The role involves collaboration with engineers, researchers, and stakeholders to ensure project success and alignment with infrastructure needs.

Responsibilities

  • Lead the planning, execution, and delivery of infrastructure projects that support AI model training and inferencing for Copilot.
  • Define project scope, goals, and deliverables that support infrastructure objectives, ensuring alignment with cross-functional teams.
  • Oversee the development of scalable and efficient compute environments, optimizing infrastructure for high-performance AI workloads.
  • Oversee the creation and enhancement of tools that support researchers in efficiently running, monitoring, and evaluating model training experiments.
  • Manage and allocate compute resources such as GPU, TPU, and other AI accelerators to meet project requirements.

Requirements

  • Bachelor's Degree AND 4+ years of experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience.
  • 2+ years of experience managing cross-functional and/or cross-team projects.
  • 2+ years of experience in end-to-end infrastructure projects, from planning to execution and delivery, in a fast-paced environment.
  • Demonstrated ability to work with engineering, research, and operations teams to align on infrastructure needs, timelines, and project requirements.
  • Ability to proactively identify risks, create mitigation strategies, and handle project roadblocks to ensure timely project delivery.
  • Deep understanding of cloud infrastructure, data centers, networking, and storage systems relevant to compute-intensive AI applications.

Nice-to-haves

  • Bachelor's Degree AND 8+ years of experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience.
  • Knowledge of GPU, TPU, CPU, and other accelerators commonly used in AI model training.
  • Familiarity with scripting languages (e.g., Python, Bash) for automating infrastructure processes.
  • Understanding of ML Ops practices to support continuous model training and deployment.
  • Proven ability to collaborate and contribute to a positive, inclusive work environment, fostering knowledge sharing and growth within the team.

Benefits

  • Industry leading healthcare
  • Educational resources
  • Discount on products and services
  • Savings and investment options
  • Maternity and paternity leave
  • Generous time away
  • Giving program
  • Opportunities to network and connect
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service