Technical Lead SRE, Storage

$175,800 - $312,200/Yr

Apple - Monte Vista, CA

posted 2 months ago

Full-time - Senior
Monte Vista, CA
Computer and Electronic Product Manufacturing

About the position

Apple is where individual imaginations gather together, committing to the values that lead to phenomenal work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other's ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It's the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you'll do more than join something - you'll add something! Apple Services Engineering (ASE) is vast, and the infrastructure teams in ASE are passionate about delivering the key building blocks for Apple Cloud. The storage SRE teams of ASE are involved in building and operating the next generation distributed storage systems to support Apple's most critical services. Operating at our scale, across multiple geographically dispersed data centers, and servicing users with exceptionally large data presents unique challenges. As a storage SRE at Apple, you'll need to solve these problems using your deep understanding of storage, data analysis, programming, teamwork, and expertise in Linux system internals. Storage SREs at Apple involve themselves across the full infrastructure stack; from tuning the block storage layer to content delivery network traffic management. Working in this role, you'll learn how storage services work in Apple, and have unique opportunities to improve them. We think critically and strive to balance the best solution with the need to get things done for each engineering challenge we face. Good ideas are heard and results are rewarded! We are looking for a technical lead seasoned in software and systems to join the Storage SRE team at Apple. The role involves tremendous amount of individual responsibility as well as influence over the direction of the various distributed storage services at Apple and will help in shaping storage infrastructure to be used by many critical Apple Cloud services for years to come.

Responsibilities

  • Architectural and Technical leadership for operating large scale distributed storage systems.
  • Identify innovative approaches to solve problems in distributed systems, strategize ideas, and take them to completion.
  • Work across team boundaries to drive best practices in resiliency for distributed storage services.
  • Establish & enhance Site Reliability Engineering practices across teams.
  • Design, Develop, review, and release code in one or more of Go, Rust, Java, and Python.

Requirements

  • Minimum of 8 years of experience in a Site Reliability Engineering, Storage Software Development, or Infrastructure Software Development role.
  • Experience in building, operating, and scaling distributed storage systems in a private, public, or hybrid cloud environment.
  • Good understanding of block, object, and file storage solutions in the industry (such as XFS, ext4, S3, EBS, Ceph, Gluster, NFS).
  • Understanding of Linux internals, standard networking protocols, and distributed systems.
  • Experience with provisioning, data migration, backup & recovery, at-scale testing, disaster recovery, and capacity planning.
  • Acute drive to automate manual operations and to improve them with well defined and tested APIs.
  • BS or MS in Computer Science or equivalent industry experience.

Nice-to-haves

  • Awareness of best practices for deployment of storage systems - implication of physical and virtual deployment models to change management, failure domains, hardware lifecycle management, etc.
  • Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks.
  • Experienced in SRE principles, such as monitoring, alerting, error budgets, fault analysis, and other common concepts in reliability engineering.
  • Skilled at identifying opportunities to reduce manual work through improvements in code and processes.
  • Familiarity with relational & non-relational databases (such as Cassandra, Postgres, & RocksDB).

Benefits

  • Comprehensive medical and dental coverage
  • Retirement benefits
  • A range of discounted products and free services
  • Reimbursement for certain educational expenses - including tuition
  • Discretionary bonuses or commission payments
  • Relocation assistance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service