SRE - Site Reliability Engineer

Diverse Lynx - New York, NY

posted 8 months ago

Full-time

New York, NY

Administrative and Support Services

Track Jobs with Teal

Apply

Add to tracker

Apply

About the position

The Site Reliability Engineer (SRE) role is a critical position that focuses on ensuring the reliability, availability, and performance of our systems and services. The SRE will be responsible for operational automation and monitoring, acting as a Subject Matter Expert (SME) in identifying toil within existing systems and processes. The primary goal is to implement automated solutions that significantly reduce toil, thereby enhancing operational efficiency and service reliability. The ideal candidate will possess strong cloud engineering experience, particularly with Google Cloud Platform (GCP), and will be adept at defining and creating Customer User Journeys (CUJ), Service Level Objectives (SLO), Service Level Indicators (SLI), and Error Budgeting based on Non-Functional Requirements (NFR). A solid understanding of Infrastructure as Code (IaC) tools such as Terraform, along with version control systems like Git and GitHub, is essential. In addition, the SRE will be expected to have hands-on experience with containerization technologies, particularly Kubernetes, and will be responsible for designing and implementing automated workflows that streamline operations. Proficiency in scripting languages such as Bash, PowerShell, Python, and Ansible is also required. The SRE will play a vital role in reducing toil in Software Development Life Cycle (SDLC) or IT operations environments, ensuring that our systems are not only reliable but also efficient and scalable.

Responsibilities

Act as the Subject Matter Expert (SME) on operational automation and monitoring.
Identify and reduce toil within existing systems and processes.
Implement automated solutions to enhance operational efficiency.
Define and create Customer User Journeys (CUJ), Service Level Objectives (SLO), Service Level Indicators (SLI), and Error Budgeting based on Non-Functional Requirements (NFR).
Design and implement automated workflows to streamline operations.
Utilize Infrastructure as Code (IaC) tools such as Terraform for system management.
Work with version control systems like Git and GitHub for source code management.
Manage and optimize containerized applications using Kubernetes.

Requirements

Cloud engineering experience, particularly with Google Cloud Platform (GCP).
Strong knowledge of Infrastructure as Code (IaC) tools such as Terraform, GitHub, and Docker Images.
Proficiency in scripting languages including Bash, PowerShell, Python, and Ansible.
Experience with container orchestration using Kubernetes.
Good understanding of Software Configuration Management (SCM) tools like Git, GitHub, and SonarQube.
Experience in reducing toil in Software Development Life Cycle (SDLC) or IT operations environments.

Track Jobs with Teal

Apply

Add to tracker

Apply

A Smarter and Faster Way to Build Your Resume

SRE - Site Reliability Engineer

About the position

Responsibilities

Requirements

Tools

Career Hubs

Guides

Company