Staff Site Reliability Engineer, Infrastructure, Observability

$166,600 - $245,000/Yr

Gm Cruise - Phoenix, AZ

posted 3 months ago

Full-time - Mid Level

Phoenix, AZ

Transportation Equipment Manufacturing

About the position

The Observability team at Cruise is seeking a Staff Site Reliability Engineer to enhance and develop observability systems, tools, and the associated codebase. This role is pivotal in ensuring the reliability, scalability, performance, efficiency, and security of our systems. As a Staff Site Reliability Engineer, you will leverage your software and systems engineering skills to contribute code, conduct code reviews, and create technical designs aimed at improving the performance and reliability of observability systems. You will proactively identify and address challenges, creating new opportunities to enhance engineering through observability. Collaboration is key, as you will partner with Software Engineering teams to understand their use-cases and guide them in effectively utilizing existing tools. Additionally, you will be responsible for building tools that enable engineers to collect and act on observability signals, thereby enhancing the overall system performance and reliability.

Responsibilities

Contribute code and perform code reviews to improve observability systems.
Create technical designs that enhance performance and reliability of observability systems.
Proactively identify challenges and opportunities for improvement in engineering through observability.
Collaborate with Software Engineering teams to understand use-cases and guide effective tool usage.
Build tools to enable engineers to collect and act on observability signals.

Requirements

Previous experience as an SRE, Production Engineer, Systems Engineer, or Software Engineer focusing on distributed systems reliability.
Considerable experience with container orchestration systems (e.g., Kubernetes).
Proficient in designing and developing sophisticated distributed systems using high-level programming languages such as Go, Python, Rust, C/C++, or NodeJS.
Experience in leading or driving a multi-functional effort to implement new technology or service.
Experience in designing and implementing large scale systems.
Considerable Linux experience.
Effective collaboration skills to work closely with team members and various engineering teams.

Nice-to-haves

Experience with Cloud Platforms such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP).
Experience with OpenTelemetry instrumentation.
Familiarity with Kubernetes, Docker, Istio, and Terraform.
Leadership experience.
Skilled in defining and instrumenting SLIs and SLOs.
Previous experience working with Prometheus, Grafana, TSDBs, and observability pipelines.

Benefits

Competitive salary and benefits
Medical / dental / vision, Life and AD&D
Subsidized mental health benefits
Paid time off and holidays
Paid parental, medical, family care, and military leave of absence
401(k) Cruise matching program
Fertility benefits
Dependent Care Flexible Spending Account
Flexible Spending Account & Health Saving Account
Perks Wallet program for benefits/perks
Pre-tax Commuter benefit plan for local employees
CruiseFlex, our location-flexible work policy.

Staff Site Reliability Engineer, Infrastructure, Observability

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company