Massachusetts Institute of Technologyposted 19 days ago
Senior
Cambridge, MA
Educational Services

About the position

LEAD SITE RELIABILITY ENGINEER, Office of Research Computing and Data (ORCD), to build and advance SRE functions in collaboration with a diverse team of systems engineers; play a pivotal part in the strategic transformation of infrastructure planning, design, delivery, and operations in support of ORCD's continued growth; and build and foster cross-functional collaboration between engineering and operations teams across MIT, ensuring alignment with institutional objectives and long-term strategic initiatives.

Responsibilities

  • Build and advance SRE functions in collaboration with a diverse team of systems engineers.
  • Play a pivotal part in the strategic transformation of infrastructure planning, design, delivery, and operations.
  • Ensure alignment with institutional objectives and long-term strategic initiatives.

Requirements

  • Bachelor's degree in engineering, computer science, related field or equivalent industry experience.
  • A minimum of seven years of experience in site reliability engineering or a related field.
  • Possess a deep and broad expertise across multiple technical domains, including Linux, networking, and virtualization.
  • Ability to drive innovation in system architecture and lead transformative design initiatives from the ground up.
  • Robust analytical and structured problem-solving skills, coupled with excellent communication and inter-personal abilities.
  • Deep understanding of Linux, LDAP, virtualization & config management in a large Linux-based engineering environment.

Nice-to-haves

  • 10+ years of experience in site reliability engineering.
  • Experience working within an HPC/research computing environment.
  • Ability to analyze network traffic to identify technical issues and suspicious activities.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service