Apple - Cupertino, CA

posted 2 months ago

Full-time
Cupertino, CA
Computer and Electronic Product Manufacturing

About the position

The Site Reliability Engineer (SRE) - Platform Cloud role at Apple focuses on developing processes, tools, and automation for managing distributed systems in production environments. The SRE team combines software and systems engineering with system administration practices to build and maintain large-scale, fault-tolerant systems that ensure the reliability, scalability, and security of Apple's services. This position involves collaboration across various teams to create platforms capable of rapidly scaling to serve data with low latencies, while also fostering a culture of continuous learning and technical innovation.

Responsibilities

  • Develop processes, tools, and automation for managing distributed systems in production environments.
  • Collaborate cross-functionally with various teams to define metrics and set performance targets.
  • Uncover optimization opportunities and define quality guardrails for services and applications.
  • Build next-generation search infrastructure and platform services.
  • Ensure the reliability, scalability, and security of Apple's services.

Requirements

  • Proficient in modern Java and optionally Python.
  • Experience with at least one scalable search platform like Solr, Kafka, ElasticSearch, or OpenSearch.
  • Strong production debugging and performance tuning skills.
  • Deep understanding of the Linux operating system, including kernel, memory, process, and threads.

Nice-to-haves

  • Experience with EC2, EBS, and Terraform.
  • Experience running stateful services on Kubernetes.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service