SR. SITE RELIABILITY ENGINEER

$110,000 - $130,000/Yr

Learfield - Syracuse, NY

posted 3 months ago

Full-time - Mid Level
Syracuse, NY
0
Performing Arts, Spectator Sports, and Related Industries

About the position

As a Senior Site Reliability Engineer at Learfield, you will play a crucial role in ensuring the reliability, availability, and performance of our services. You will work in cross-discipline teams, collaborating closely with our domain engineering and Site Reliability Engineering teams to architect and maintain live services. Your responsibilities will include planning and forecasting service capacity and demand, analyzing software performance, and tuning systems and software to meet our high standards. You will also be tasked with solving mission-critical incidents and building automation to prevent problem recurrence, effectively automating away all toil associated with operational tasks. In this role, you will identify root causes of production issues and recommend permanent solutions, ensuring that our systems are robust and resilient. You will set up and improve monitoring systems, including metrics, logs, and alerts, to quickly identify and address issues as they arise. Additionally, you will develop effective documentation, tooling, and alerts to mitigate risks and enhance our operational capabilities. Security is a top priority, and you will actively participate in efforts to keep our environment secure by reviewing compliance and internal scans, working with development teams to stay ahead of security vulnerabilities. You will also be responsible for developing Run Books for our Level I NOC team to reduce Mean Time to Detection (MTTD) and Mean Time to Recovery (MTTR) for alerts. Participation in an on-call rotation with other members of the Site Reliability Engineering team will be expected, ensuring that we maintain high service levels even during off-hours. This position offers a unique opportunity to influence the technology growth that impacts millions of customers across the entertainment space, allowing you to grow both our products and your career.

Responsibilities

  • Work in cross-discipline teams to ensure service reliability, availability, and performance
  • Collaborate with domain engineering and Site Reliability Engineering teams to architect and maintain live services
  • Plan and forecast service capacity and demand, analyze software performance, and tune systems and software
  • Solve mission-critical incidents and build automation to prevent problem recurrence; automate away all toil
  • Identify root causes of production issues, and recommend permanent solutions for them
  • Setup and improve monitoring (metrics, logs, alerts, etc) to identify issues quickly
  • Develop effective documentation, tooling, and alerts to identify and address risks
  • Actively participate/offer solutions to keep our environment secure
  • Review compliance and internal scans and work with development teams to stay ahead of security vulnerabilities
  • Develop Run Books for Level I NOC team to reduce MTTD/MTTR for alerts
  • Participate in on-call rotation with other members of Site Reliability Engineering team

Requirements

  • Experience with Linux container technologies (Docker, Kubernetes)
  • Experience with public and private clouds: GCP, OpenStack, AWS, and/or Azure
  • Understanding of cloud orchestration frameworks (terraform, Kubernetes, argoCD, spinnaker, etc) and their role in IT transformation
  • 5+ years' experience working with Linux systems and related tooling (kernel, shell, system libraries, file systems, client-server protocols, etc)
  • The ability to read/write code fluently in C#, Python, or Go
  • Deep understanding of software development lifecycle including git-based CI and CD pipelines
  • Networking: experience with network theory and protocols, e.g. TCP/IP, UDP, DNS, HTTP, TLS, and load balancing
  • Strong experience in distributed systems architectures - layered, event-driven, service mesh, etc.
  • Familiarity with distributed message buses such as Kafka, Confluent
  • Strong interpersonal and communication skills

Benefits

  • Medical
  • Dental
  • Vision
  • Health Savings Account
  • Life Insurance
  • Flexible Paid Time Off (including Parental Leave)
  • Paid Holidays
  • 401(k)
  • Short/Long Term Disability
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service