Apple - San Diego, CA

posted 3 months ago

Full-time - Mid Level
San Diego, CA
Computer and Electronic Product Manufacturing

About the position

The Atlassian Services Site Reliability Engineer (SRE) role is a critical position within the Software Delivery organization at Apple, which plays a vital role in the software release process. This position is responsible for applying Site Reliability Engineering practices to maintain Atlassian services, which are essential tools for software engineers and project managers involved in developing Apple software for global delivery. The Atlassian Services team focuses on ensuring the reliability and performance of data center applications, enhancing observability of services, responding to incident alerts, and reporting on Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to provide visibility across the organization. The SRE role is crucial for maintaining the production systems of key applications such as Bitbucket, Confluence, and Jira, which are integral to delivering cutting-edge operating systems, applications, and firmware to Apple customers. In this role, the Site Reliability Engineer will be tasked with various responsibilities, including the configuration and monitoring of both on-premises and cloud-based dependencies. The engineer will also automate continuous integration (CI) and continuous delivery (CD) pipelines, maintain staging and production environments with the goal of maximizing uptime, and implement observability systems for effective monitoring, alerting, and metrics reporting. Additionally, the engineer will generate reports on service metrics related to performance, availability, and reliability, and champion best practices in change control management and incident response. A successful candidate will be expected to proactively communicate the status of Atlassian services to stakeholders and follow through on time-sensitive tasks. They should demonstrate a willingness to seek clarification and increase awareness of the larger context, explore solutions to problems while evaluating risk versus reward, and execute the best approach. Effective asynchronous communication with a global team across multiple time zones is essential, as is the ability to document new processes or update existing documentation. The ideal candidate will be eager and curious to learn across multiple technology stacks, contributing to the overall success of the team and the organization.

Responsibilities

  • Configuration and monitoring of on-prem and cloud-based dependencies
  • Automate continuous integration (CI) and continuous delivery (CD) pipelines
  • Maintain staging and production environments with the goal of maximizing uptimes
  • Implement observability of systems for monitoring, alerting, and metrics reporting
  • Generate reports regarding service metrics on performance, availability, and reliability
  • Champion practices regarding change control management and incident response
  • Proactively communicate status of Atlassian services to stakeholders
  • Document new processes or update existing documentation
  • Explore solutions to problems, evaluate risk vs reward, then execute best approach
  • Communicate asynchronously with a global team across multiple timezones

Requirements

  • B.S. in Computer Science or related work experience
  • Experience in managing and monitoring fleets of *nix systems or container platforms
  • SRE or Dev/Ops experience in managing customer-facing systems in a 24/7 environment
  • Understanding of distributed systems with respect to application, networking, and security
  • Excellent judgment and integrity with the ability to make timely and sound decisions
  • Ability to anticipate the needs of others and adapt to changing conditions

Nice-to-haves

  • Experience as SCM administrator (e.g. Github, or similar)
  • Experience with container platforms (e.g. Docker, or similar)
  • Experience with monitoring and alerting (e.g. Prometheus, Grafana, or similar)
  • Experience with data analysis (e.g. Splunk, or similar)

Benefits

  • Comprehensive medical and dental coverage
  • Retirement benefits
  • Discounted products and free services
  • Reimbursement for certain educational expenses, including tuition
  • Opportunity to participate in Apple's discretionary employee stock programs
  • Eligibility for discretionary bonuses or commission payments
  • Relocation assistance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service