Associate SRE

$71,300 - $88,000/Yr

Evolent Health - Trenton, NJ

posted 3 months ago

Full-time - Entry Level

Trenton, NJ

Professional, Scientific, and Technical Services

About the position

As an Associate Site Reliability Engineer at Evolent, you will play a crucial role in managing our extensive application suite and cloud infrastructure. This position is part of the Platform Engineering organization, which is dedicated to transforming the management of cloud infrastructure and application reliability. Your contributions will be vital in ensuring that our systems operate smoothly and efficiently, ultimately leading to better health outcomes for our clients. In this role, you will be responsible for identifying and implementing solutions for recurring application problems, thereby enhancing application reliability. You will execute corrective actions identified during post-incident reviews (PIRs) or root cause analyses (RCAs), ensuring that we learn from past incidents to prevent future occurrences. Your participation in incident management and after-hours support will be essential in maintaining the reliability of our systems. You will also maintain observability solutions to gather and analyze system metrics from production systems, identifying performance bottlenecks as part of Application Performance Management (APM) and resolving them effectively. Automation will be a key focus of your work, as you will be tasked with automating tasks to improve efficiency and reduce manual effort. Collaboration with Application Engineering teams and other Site Reliability Engineers (SREs) will be necessary to ensure the reliability and scalability of our systems. Additionally, you will have the opportunity to learn and utilize tools such as Terraform and Ansible for provisioning and managing infrastructure.

Responsibilities

Own finding and implementing solutions for recurring application problems to increase application reliability.
Execute all corrective actions identified as part of post-incident reviews (PIRs) or root cause analyses (RCAs).
Participate in incident management and after-hours support.
Maintain observability solutions to gather and analyze system metrics into production systems.
Identify performance bottlenecks as part of APM and resolve each.
Automate tasks to improve efficiency and reduce manual effort.
Collaborate with Application Engineering teams and SREs to ensure the reliability and scalability of systems.
Learn and use Terraform and Ansible to provision and manage infrastructure.

Requirements

2+ years of hands-on Azure experience and 3+ years of overall cloud-native experience.
DevOps/SRE mindset with the ability to identify and automate opportunities.
Expertise in at least one scripting language: PowerShell, YAML, HCL, or Python.
Expertise in at least one APM tool: Prometheus, Dynatrace, Application Insight, or DataDog.
Experience leveraging agile methodology (i.e., Scrumban) to manage project work.
Excellent communication skills and comfort with a high level of transparency.
Willingness to learn new technologies.
Team player.

Benefits

Comprehensive health insurance benefits
Bonus component based on pre-defined performance factors
Work/life balance and flexibility
Diversity and inclusion initiatives

Associate SRE

About the position

Responsibilities

Requirements

Benefits

Tools

Career Hubs

Guides

Company