Apple - Austin, TX
posted 2 months ago
As a Site Reliability Engineer (SRE) at Apple, you will play a crucial role in ensuring the reliability, scalability, and performance of our systems and services that integrate Apple Retail Stores and Apple Online Store with major US Carriers for iPhone activations. This position requires extensive hands-on experience in working as an SRE engineer for large-scale, customer-facing cloud applications. You will collaborate closely with engineering and operations teams to design, build, and maintain robust infrastructure and automation solutions. Your expertise will be vital in representing the SRE organization during design reviews and operational readiness exercises for both new and existing services. In this dynamic environment, you will be expected to analyze system statistics to provide a clear picture of the current state of our systems. A strong understanding of SRE principles, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts, is essential. You will also need excellent troubleshooting and problem-solving skills to proactively address critical production issues and work with necessary partners to resolve them. Your responsibilities will include automating manual operations and improving processes through repeated iterations. You should have a good understanding of networking and load balancing concepts, and the ability to lead a small team to develop innovative solutions. Participation in an on-call rotation will be required, providing hands-on technical expertise during service-impacting events. This role is ideal for someone who is self-motivated, capable of making business-critical decisions, and comfortable working in a fast-paced, ever-changing environment.