CarGurusposted 27 days ago
Mid Level
Boston, MA

About the position

As a member of the CarGurus reliability team, the site reliability engineer will be responsible for defining, maintaining, and promulgating best practices and tools for SRE and observability.

Responsibilities

  • Linux administration, site reliability best practices, incident management, critical on call.
  • Collaborating with Engineering and Product Managers to define SLOs and monitoring of well-designed SLIs.
  • Embedding with Engineering teams and independently addressing issues or collaborating to improve operational excellence.
  • Being the primary point of escalation and on the on call rotation for major engineering incidents.
  • Owning our Incident Response Process, including conducting blameless Postmortems.
  • Partnering with Engineering teams to ensure new services are production-ready.
  • Championing our organizational standards for architecting, observing, deploying, and scaling our products.
  • Evolving and maintaining our tracing, logging, monitoring, alerting, and other observability systems to increase observability and transparency.
  • Educating the company on observability tools and troubleshooting techniques and practices.
  • Making Data-Driven decisions to drive continuous improvement.
  • Refusing to accept manual work as a solution to areas of weakness.

Requirements

  • Linux administration, SRE theory and vocabulary, basic coding and scripting, production experience, incident management experience.
  • A proven background in software engineering with multiple languages and significant relative operational experience running revenue-critical services at scale.
  • Understanding of technologies beyond coding such as Load Balancing, Configuration Management, Kubernetes, Terraform and Observability Systems.
  • Comfort in dealing with Incidents and Availability Issues under pressure.
  • Familiarity and experience working with cloud infrastructure in an AWS environment.
  • Familiarity with modern best Site Reliability Engineering practices and theory.
  • Comfort and skill in written and verbal communication across teams and organizations.
  • Excitement in solving puzzles, discovering how a new service or tool works by identifying the individual components, libraries, and relationships it is built upon.
  • A bias for action, but sufficient emotional intelligence to approach colleagues with positive regard and understanding their challenges and decisions.
  • Curiosity and the acceptance that there are always ways to learn and grow.
  • The desire to be an active contributor in a collaborative and fast-paced environment.

Benefits

  • Equity for all employees, both when they start and as they continue to grow with us.
  • Career development and corporate giving programs.
  • Employee resource groups (ERGs) and communities.
  • Flexible hybrid model.
  • Robust time off policies.
  • Daily free lunch.
  • New car discount.
  • Meditation and fitness apps.
  • Commuting cost coverage.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service