Site Reliability Engineer

$140,600 - $187,400/Yr

Atlassian - Mountain View, CA

posted 3 months ago

Full-time - Mid Level

Mountain View, CA

Publishing Industries

About the position

As an Atlassian Site Reliability Engineer (SRE), you will play a crucial role in enhancing the performance and reliability of our services. Your primary focus will be on identifying and addressing the root causes of incidents, thereby reducing incident rates. You will engage deeply with the services we support, taking ownership of problems and their corresponding solutions, while also automating repetitive tasks to improve efficiency. Your responsibilities will include responding to alerts and investigating issues within our systems, allowing you to tackle complex challenges head-on. The ideal candidate for this position will possess a collaborative spirit, as success in this role is not about having all the answers but rather about working together to find solutions. You will be expected to ask questions, learn from your peers, and transform chaos into order. As part of your duties, you will participate in an on-call rotation to ensure our products meet established Service Level Agreements (SLAs). This position is well-suited for individuals with creative problem-solving skills and a strong sense of accountability for the code they write, from development to production. In this role, you will develop and implement scalable solutions that directly enhance the reliability of our services. You will take ownership of development efforts throughout each sprint, from planning to delivery, and collaborate with team members to review code. We promise that you will find this role engaging and dynamic, with no shortage of interesting challenges to tackle.

Responsibilities

Improve the performance and reliability of services.
Address root causes of incidents and reduce incident rates.
Deep dive into supported services and own problem-solving efforts.
Automate repetitive tasks to enhance efficiency.
Respond to alerts and investigate system issues.
Engage in capacity planning and demand forecasting.
Conduct software performance analysis and systems tuning.
Maintain high standards of code quality.
Collaborate with team members to review code and ensure quality.

Requirements

Experience writing code in Bash and Python.
Ability to triage and diagnose user-facing service outages.
Experience in capacity planning and demand forecasting.
Proficient in software performance analysis and systems tuning.
Experience configuring and managing enterprise monitoring solutions.
Strong understanding of Linux systems.
Experience building, automating, and maintaining infrastructure in AWS.
Ability to maintain high standards of code quality.

Nice-to-haves

Exposure to configuration management and orchestration tools like Ansible and Puppet.
Experience with container management and microservices architectures such as Docker and Kubernetes.
Understanding of ITIL terminology for incident and problem management.
Experience managing and troubleshooting a continuous integration pipeline.
Ability to break down complex projects into manageable tasks.
Familiarity with compliance requirements (SOC2, FedRamp, etc).

Benefits

Health coverage
Paid volunteer days
Wellness resources

Site Reliability Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company