Principal Site Reliability Engineer

$225,000 - $344,800/Yr

General Motors - Sunnyvale, CA

posted 4 days ago

Full-time - Principal

Sunnyvale, CA

Transportation Equipment Manufacturing

About the position

The Principal Site Reliability Engineer (SRE) at General Motors is a hybrid role focused on enhancing the reliability, efficiency, and scalability of distributed systems within the automotive industry. This position requires a blend of software and systems engineering skills to maintain the health of infrastructure while optimizing for reliability and cost-efficiency. The successful candidate will work closely with software development teams, driving improvements and automating processes to ensure high-quality service delivery. As an Individual Contributor, the role emphasizes hands-on involvement in troubleshooting, incident response, and implementing observability frameworks.

Responsibilities

Develop tools and software to automate operational processes, improve system reliability, and reduce manual intervention.
Lead, implement, and improve monitoring and observability frameworks for proactive incident detection and resolution.
Participate in an on-call rotation to diagnose, troubleshoot, and mitigate production incidents, ensuring minimal downtime.
Collaborate with developers to ensure the quality, scalability, and reliability of services, fostering a shared ownership culture.
Manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to effectively manage reliability expectations.
Conduct deep-dive analyses of incidents and collaborate on post-incident reviews to derive learnings and prevent recurrence.
Evaluate system performance and advocate for optimizations that reduce infrastructure costs while maintaining service reliability.

Requirements

Proficiency in at least one programming language (e.g., Python, Go, Java) and familiarity with multiple language ecosystems.
Solid understanding of operating systems, networking, distributed systems, databases, and storage architectures.
Deep understanding of how code runs on underlying hardware, including operating systems, algorithms, and data structures.
Experience handling production incidents, including root cause analysis and mitigation.
Strong communication skills to explain technical concepts to both engineering and business stakeholders.
Proven experience in automating manual processes, building deployment pipelines, or managing configuration systems.
Bachelor's degree in computer science or related field, or equivalent work experience.

Nice-to-haves

Experience with cloud platforms (AWS, GCP, Azure).
Familiarity with container orchestration systems like Kubernetes.
A track record of managing or developing distributed systems.
Prior experience with Java in production.
8+ years of experience.

Benefits

Medical, dental, vision insurance
Health Savings Account
Flexible Spending Accounts
Retirement savings plan with company matching contributions
Sickness and accident benefits
Life insurance
Paid vacation and holidays
Tuition assistance programs
Employee assistance program
GM vehicle discounts

Principal Site Reliability Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company