General Motors - Sunnyvale, CA

posted 4 days ago

Full-time - Principal
Sunnyvale, CA
Transportation Equipment Manufacturing

About the position

The Principal Site Reliability Engineer (SRE) at General Motors is a hybrid role focused on enhancing the reliability, efficiency, and scalability of distributed systems within the automotive industry. This position requires a blend of software and systems engineering skills to maintain the health of infrastructure while optimizing for reliability and cost-efficiency. The successful candidate will work closely with software development teams, driving improvements and automating processes to ensure high-quality service delivery. As an Individual Contributor, the role emphasizes hands-on involvement in troubleshooting, incident response, and implementing observability frameworks.

Responsibilities

  • Develop tools and software to automate operational processes, improve system reliability, and reduce manual intervention.
  • Lead, implement, and improve monitoring and observability frameworks for proactive incident detection and resolution.
  • Participate in an on-call rotation to diagnose, troubleshoot, and mitigate production incidents, ensuring minimal downtime.
  • Collaborate with developers to ensure the quality, scalability, and reliability of services, fostering a shared ownership culture.
  • Manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to effectively manage reliability expectations.
  • Conduct deep-dive analyses of incidents and collaborate on post-incident reviews to derive learnings and prevent recurrence.
  • Evaluate system performance and advocate for optimizations that reduce infrastructure costs while maintaining service reliability.

Requirements

  • Proficiency in at least one programming language (e.g., Python, Go, Java) and familiarity with multiple language ecosystems.
  • Solid understanding of operating systems, networking, distributed systems, databases, and storage architectures.
  • Deep understanding of how code runs on underlying hardware, including operating systems, algorithms, and data structures.
  • Experience handling production incidents, including root cause analysis and mitigation.
  • Strong communication skills to explain technical concepts to both engineering and business stakeholders.
  • Proven experience in automating manual processes, building deployment pipelines, or managing configuration systems.
  • Bachelor's degree in computer science or related field, or equivalent work experience.

Nice-to-haves

  • Experience with cloud platforms (AWS, GCP, Azure).
  • Familiarity with container orchestration systems like Kubernetes.
  • A track record of managing or developing distributed systems.
  • Prior experience with Java in production.
  • 8+ years of experience.

Benefits

  • Medical, dental, vision insurance
  • Health Savings Account
  • Flexible Spending Accounts
  • Retirement savings plan with company matching contributions
  • Sickness and accident benefits
  • Life insurance
  • Paid vacation and holidays
  • Tuition assistance programs
  • Employee assistance program
  • GM vehicle discounts
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service