Site Reliability Engineer

$175,800 - $264,200/Yr

Apple - Sunnyvale, CA

posted 3 months ago

Full-time - Senior

Sunnyvale, CA

Computer and Electronic Product Manufacturing

About the position

Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. The people here at Apple don't just create products - they create the kind of wonder that's revolutionized entire industries. It's the diversity of those people and their ideas that inspires the innovation that runs through everything we do, from amazing technology to industry-leading environmental efforts. Join Apple, and help us leave the world better than we found it. Apple's Manufacturing Systems & Infrastructure (MSI) team is responsible for gathering, consolidating and tracking all manufacturing data for Apple's products and modules worldwide. This data is used throughout the company and the product's lifecycle, from the very beginning, to validate that units being built are fully tested and of high quality before leaving the factory, all of the way through to warranty support for customers. As a Senior Site Reliability Engineer, you will play a critical role in maintaining and enhancing the reliability of our production systems. You will collaborate with engineering teams to design, implement, and monitor infrastructure and services, employing your expertise in automation and performance optimization.

Responsibilities

Design, develop, and maintain scalable, reliable, and efficient infrastructure.
Implement monitoring, alerting, and logging systems to ensure the health and performance of applications.
Automate repetitive tasks and improve system efficiency through scripting and tool development.
Collaborate with development teams to improve service reliability and promote best practices in software development and deployment.
Conduct root cause analysis of system failures and implement corrective actions to prevent recurrence.
Participate in on-call rotations and respond to incidents, minimizing downtime and impact on users.
Drive continuous improvement initiatives to enhance system performance, scalability, and reliability.
Mentor and provide guidance to junior team members, fostering a culture of learning and innovation.

Requirements

Experience in designing and maintaining scalable infrastructure.
Proficiency in monitoring, alerting, and logging systems.
Strong scripting skills for automation and tool development.
Experience collaborating with development teams on service reliability.
Ability to conduct root cause analysis and implement corrective actions.
Experience in incident response and minimizing downtime.
Knowledge of continuous improvement methodologies.
Mentoring experience with junior team members.

Nice-to-haves

Experience with cloud services and infrastructure as code.
Familiarity with containerization technologies like Docker and Kubernetes.
Knowledge of performance optimization techniques.
Experience with configuration management tools.

Benefits

Comprehensive medical and dental coverage
Retirement benefits
Discounted products and free services
Reimbursement for certain educational expenses including tuition
Discretionary bonuses or commission payments
Relocation assistance
Participation in employee stock programs

Site Reliability Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company