Site Reliability Engineer

$130,000 - $155,000/Yr

Major League Soccer - New York, NY

posted 2 months ago

Full-time - Senior

Remote - New York, NY

Performing Arts, Spectator Sports, and Related Industries

About the position

We are seeking a Site Reliability Engineer (SRE) to lead and mentor our SRE and TechOps teams with a focus on automation to drive accountability, efficiency, and continuous improvement. This role involves building and maintaining observability frameworks to ensure system reliability, performance, and scalability, while fostering a culture of innovation through iterative enhancements. The SRE will ensure smooth platform operations, drive automation, and streamline workflows to reduce manual interventions, supporting both gameday and non-gameday activities.

Responsibilities

Develop and implement observability frameworks to monitor the health and performance of services, ensuring uptime and reliability.
Be the first line of defense in troubleshooting and resolving incidents without relying on runbooks, using strong problem-solving skills.
Perform thorough API testing for published content using tools like Postman and Cypress to ensure accuracy and performance.
Utilize Terraform for managing infrastructure, including ServiceNow integrations, and automate workflows.
Leverage Datadog or equivalent tools to set up monitoring, logging, and alerting systems.
Work closely with cross-functional teams to ensure seamless integration and deployment of services.
Manage and optimize AWS resources, including EKS and ECS, to ensure scalability and cost-efficiency.
Use GitLab pipelines for continuous integration and deployment, ensuring smooth and automated delivery of code changes.
Integrate tools like ServiceNow with Slack or Asana to streamline workflows and enhance team communication.
Lead and manage a team of highly skilled consultants and full-time professionals, cultivating a culture of innovation, accountability, and continuous improvement.

Requirements

Bachelor's degree in Computer Science, Information Technology, or a related field.
7+ years of experience, with 5+ in Cloud Expertise and Technical Operations.
Proven background in architecting and managing cloud solutions (AWS, Azure, Google Cloud).
Hands-on experience in complex technology operations environments, including infrastructure, network, security, and incident management.
2+ years managing or mentoring roles within technology operations (ITSM/ITOM) or a related field.
Proficiency in implementing automation tools and driving automation excellence within the organization.

Nice-to-haves

Advanced degrees or certifications (e.g., ITIL, AWS, Azure).
Familiarity with GCP and Azure.
Experience with Go, React/React Native.
ETL experience between third parties.

Benefits

Comprehensive and competitive medical, dental, and vision benefits.
$500 Wellness Reimbursement.
Generous PTO offering.
Hybrid Office/Remote Work Schedule.
On-the-job training and ongoing educational opportunities.
Office perks, discounts, and employee events.

Site Reliability Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company