Evolent Health - Helena, MT
posted about 2 months ago
As an Associate Site Reliability Engineer at Evolent, you will play a crucial role in managing our extensive application suite and cloud infrastructure. This position is part of the Platform Engineering organization, where you will collaborate with a talented team dedicated to transforming the management of cloud infrastructure and application reliability. Your contributions will be vital in ensuring that our systems operate smoothly and efficiently, ultimately leading to better health outcomes for our clients. In this role, you will be responsible for identifying and implementing solutions for recurring application problems, thereby enhancing application reliability. You will execute corrective actions identified during post-incident reviews (PIRs) or root cause analyses (RCAs) and participate in incident management, including after-hours support. Your expertise will also be required to maintain observability solutions that gather and analyze system metrics from production systems. Additionally, you will identify performance bottlenecks as part of Application Performance Management (APM) and work to resolve these issues. Automation will be a key focus of your work, as you will be tasked with automating tasks to improve efficiency and reduce manual effort. Collaboration is essential in this role, as you will work closely with Application Engineering teams and other Site Reliability Engineers (SREs) to ensure the reliability and scalability of our systems. You will also have the opportunity to learn and utilize tools such as Terraform and Ansible for provisioning and managing infrastructure, further enhancing your skill set and contributing to the team's success.