SRE Cloudstack

Apple - Cupertino, CA

posted 2 months ago

Full-time

Cupertino, CA

Computer and Electronic Product Manufacturing

About the position

As a Site Reliability Engineer (SRE) at Apple, you will play a crucial role in supporting and scaling cloud services that cater to thousands of development and operations engineers. This position is part of Apple's Cloud Service Infrastructure team, where you will be responsible for establishing SRE practices for a private cloud service. Your work will directly impact the reliability and consistency of application delivery across the organization. This hands-on role requires a self-motivated individual with a passion for excellence, quality, and detail. You will not only support operations but also collaborate closely with developers and architects to design and implement solutions that enhance stability, security, and scalability. In this role, you will operate, monitor, and triage all aspects of both production and non-production environments. You will pioneer and implement the next-generation compute platform, preparing alert handling procedures and runbooks while collaborating with off-shore SRE teams. Automation will be a key focus, as you will automate the deployment and orchestration of services into the cloud environment, along with other routine processes. Your participation in workload balancing, scale testing, and disaster recovery exercises will be essential to ensure the robustness of the cloud services. Additionally, you will work closely with partner teams, including engineering, QA, and program management, and nurture relationships with internal and external third-party vendors.

Responsibilities

Operate, monitor, and triage all aspects of production and non-production environments.
Pioneer and implement the next-generation compute platform.
Prepare alert handling procedures and runbooks, collaborating with off-shore SRE teams.
Automate deployment and orchestration of services into the cloud environment and other routine processes.
Actively participate in workload balancing, scale testing, and disaster recovery exercises.
Support partner teams, including engineering, QA, and program management.
Manage relationships with internal and external third-party vendors.

Requirements

Demonstrated proficiency in managing cloud operations, focusing on infrastructure-as-a-service (compute, storage, and network virtualization).
Coding experience using high-level programming languages like Java, Golang, and Python.
Familiarity with cloud infrastructure concepts (zones, regions, VPCs, etc.).
Experience with GitOps, CI/CD tools, and deployment strategies such as Spinnaker and Argo.
Experience with Linux system virtualization (Libvirt, QEMU, KVM, etc.), including knowledge of relevant APIs and programming languages.
Familiarity with SQL-like DB queries is a plus.
Experience with Infrastructure as a Service orchestration tools (OpenStack, CloudStack, etc.) is a plus.
Solid grasp of Linux system administration.
Understanding of advanced telemetry and observability for services at different levels (API, runtime, infrastructure, log analysis, etc.).
Working understanding of common authentication schemes, certificates, and securely handling secrets.
Minimum 5+ years of industry experience.

Nice-to-haves

B.S. in computer science or similar field or equivalent experience.

SRE Cloudstack

About the position

Responsibilities

Requirements

Nice-to-haves

Tools

Career Hubs

Guides

Company