Atlassian Services Site Reliability Engineer

$129,600 - $236,300/Yr

Apple - San Diego, CA

posted 3 months ago

Full-time - Mid Level

San Diego, CA

Computer and Electronic Product Manufacturing

About the position

The Atlassian Services Site Reliability Engineer (SRE) role is a critical position within the Software Delivery organization at Apple, which plays a vital role in the software release process. This position is responsible for applying Site Reliability Engineering practices to maintain Atlassian services, which are essential tools for software engineers and project managers involved in developing Apple software for global delivery. The Atlassian Services team focuses on ensuring the reliability and performance of data center applications, enhancing observability of services, responding to incident alerts, and reporting on Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to provide visibility across the organization. The SRE role is crucial for maintaining the production systems of key applications such as Bitbucket, Confluence, and Jira, which are integral to delivering cutting-edge operating systems, applications, and firmware to Apple customers. In this role, the Site Reliability Engineer will be tasked with various responsibilities, including the configuration and monitoring of both on-premises and cloud-based dependencies. The engineer will also automate continuous integration (CI) and continuous delivery (CD) pipelines, maintain staging and production environments with the goal of maximizing uptime, and implement observability systems for effective monitoring, alerting, and metrics reporting. Additionally, the engineer will generate reports on service metrics related to performance, availability, and reliability, and champion best practices in change control management and incident response. A successful candidate will be expected to proactively communicate the status of Atlassian services to stakeholders and follow through on time-sensitive tasks. They should demonstrate a willingness to seek clarification and increase awareness of the larger context, explore solutions to problems while evaluating risk versus reward, and execute the best approach. Effective asynchronous communication with a global team across multiple time zones is essential, as is the ability to document new processes or update existing documentation. The ideal candidate will be eager and curious to learn across multiple technology stacks, contributing to the overall success of the team and the organization.

Responsibilities

Configuration and monitoring of on-prem and cloud-based dependencies
Automate continuous integration (CI) and continuous delivery (CD) pipelines
Maintain staging and production environments with the goal of maximizing uptimes
Implement observability of systems for monitoring, alerting, and metrics reporting
Generate reports regarding service metrics on performance, availability, and reliability
Champion practices regarding change control management and incident response
Proactively communicate status of Atlassian services to stakeholders
Document new processes or update existing documentation
Explore solutions to problems, evaluate risk vs reward, then execute best approach
Communicate asynchronously with a global team across multiple timezones

Requirements

B.S. in Computer Science or related work experience
Experience in managing and monitoring fleets of *nix systems or container platforms
SRE or Dev/Ops experience in managing customer-facing systems in a 24/7 environment
Understanding of distributed systems with respect to application, networking, and security
Excellent judgment and integrity with the ability to make timely and sound decisions
Ability to anticipate the needs of others and adapt to changing conditions

Nice-to-haves

Experience as SCM administrator (e.g. Github, or similar)
Experience with container platforms (e.g. Docker, or similar)
Experience with monitoring and alerting (e.g. Prometheus, Grafana, or similar)
Experience with data analysis (e.g. Splunk, or similar)

Benefits

Comprehensive medical and dental coverage
Retirement benefits
Discounted products and free services
Reimbursement for certain educational expenses, including tuition
Opportunity to participate in Apple's discretionary employee stock programs
Eligibility for discretionary bonuses or commission payments
Relocation assistance

Atlassian Services Site Reliability Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company