Car Gurusposted 27 days ago
Hybrid • Boston, MA
Publishing Industries

About the position

As a member of the CarGurus reliability team, the site reliability engineer will be responsible for defining, maintaining, and promulgating best practices and tools for SRE and observability.

Responsibilities

  • Linux administration, site reliability best practices, incident management, critical on call.
  • Collaborating with Engineering and Product Managers to define SLOs and monitoring of well-designed SLIs
  • Embedding with Engineering teams and independently addressing issues or collaborating to improve operational excellence
  • Being the primary point of escalation and on the on call rotation for major engineering incidents
  • Owning our Incident Response Process, including conducting blameless Postmortems
  • Partnering with Engineering teams to ensure new services are production-ready
  • Championing our organizational standards for architecting, observing, deploying, and scaling our products
  • Evolving and maintaining our tracing, logging, monitoring, alerting, and other observability systems to increase observability and transparency
  • Educating the company on observability tools and troubleshooting techniques and practices
  • Making Data-Driven decisions to drive continuous improvement
  • Refusing to accept manual work as a solution to areas of weakness

Requirements

  • Linux administration, SRE theory and vocabulary, basic coding and scripting, production experience, incident management experience.
  • A proven background in software engineering with multiple languages and significant relative operational experience running revenue-critical services at scale
  • Understanding of technologies beyond coding such as Load Balancing, Configuration Management, Kubernetes, Terraform and Observability Systems
  • Comfort in dealing with Incidents and Availability Issues under pressure
  • Familiarity and experience working with cloud infrastructure in an AWS environment
  • Familiarity with modern best Site Reliability Engineering practices and theory
  • Comfort and skill in written and verbal communication across teams and organizations
  • Excitement in solving puzzles, discovering how a new service or tool works by identifying the individual components, libraries, and relationships it is built upon
  • A bias for action, but sufficient emotional intelligence to approach colleagues with positive regard and understanding their challenges and decisions
  • Curiosity and the acceptance that there are always ways to learn and grow
  • The desire to be an active contributor in a collaborative and fast-paced environment

Benefits

  • Equity for all employees, both when they start and as they continue to grow with us.
  • Career development and corporate giving programs.
  • Employee resource groups (ERGs) and communities to help people build connections.
  • Flexible hybrid model and robust time off policies to encourage work-life balance.
  • Daily free lunch.
  • New car discount.
  • Meditation and fitness apps.
  • Commuting cost coverage.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service