Meta - Menlo Park, CA

posted 10 days ago

Full-time - Senior
Menlo Park, CA
Web Search Portals, Libraries, Archives, and Other Information Services

About the position

This position will play a critical role in driving end-to-end AI product introductions and AI operations initiatives supporting Meta's growing AI/HPC infrastructure for our Family of Apps. The candidate will oversee the entire program lifecycle, from concept to planning to execution to monitoring, ensuring successful delivery and implementation. This includes collaborating with cross-functional Engineering teams to define scope, goals, and timelines, as well as leading the cross-functional teams in delivering business outcomes. The ideal candidate will have experience in AI/HPC product development and operations, a strong understanding of the Network communications stack for AI solutions, and excellent communication and leadership abilities.

Responsibilities

  • Lead technical program management of next-generation AI/ML platforms for Meta's Network Infrastructure across multiple areas and locations.
  • Collaborate with Engineering and business owners to define program requirements, set priorities, and establish scope.
  • Manage cross-functional dependencies, risks, and changes effectively by optimizing scope, schedule, and resources.
  • Develop and own communication plans to effectively communicate program status, issues, and risks to stakeholders.
  • Partner with cross-functional teams to drive technical analysis, design, development, testing, implementation, and post-implementation phases.
  • Define and track key metrics and performance indicators and drive cross-functional execution of program deliverables.
  • Proactively identify and analyze complex, long-term infrastructure problems with engineering leaders and stakeholders.
  • Drive internal and external process improvements across multiple teams and functions, including reducing manual efforts through automation.
  • Build strong and aligned program teams to efficiently deliver on shared goals.

Requirements

  • B.S. in Computer Science or a related technical discipline, or equivalent experience.
  • 12+ years of software engineering, systems engineering, hardware engineering, or technical product/program management experience.
  • 8+ years experience in delivering Network solutions/Programs for Data Center applications.
  • Experience delivering tech programs or products from inception to delivery.
  • Experience operating autonomously across multiple teams, demonstrated critical thinking, and thought leadership.
  • Communication experience and experience working with technical management teams to develop systems, solutions, and products.
  • Analytical and problem-solving experience with large-scale systems.
  • Experience establishing work relationships across multi-disciplinary teams and multiple partners in different time zones.
  • Understanding of the Network communication stack, Network Hardware (NICs, Optics & Switches).
  • Experience Developing & Delivering AI Cluster Solutions for training & inference use cases.

Nice-to-haves

  • Experience in Network protocols (RoCE, IB, Ethernet).
  • Experience working with large scale distributed systems.
  • Experience with data center architecture & Deployment.
  • Experience working with ODMs and silicon vendors.
  • Experience with AI training and inference model deployments to physical infrastructure.

Benefits

  • Bonus
  • Equity
  • Health benefits
  • Paid time off
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service