Evolent Health - Olympia, WA
posted about 2 months ago
As an Associate Site Reliability Engineer at Evolent, you will play a crucial role in managing our extensive application suite and cloud infrastructure. This position is part of the Platform Engineering organization, where you will collaborate with a talented team dedicated to transforming the management of cloud infrastructure and application reliability. Your contributions will be instrumental in ensuring that our systems are reliable, scalable, and efficient, ultimately leading to better health outcomes for our clients. In this role, you will be responsible for identifying and implementing solutions for recurring application problems, which is essential for increasing application reliability. You will execute corrective actions identified during post-incident reviews (PIRs) or root cause analyses (RCAs), ensuring that we learn from incidents and improve our systems. Additionally, you will participate in incident management and provide after-hours support as needed. Your responsibilities will also include maintaining observability solutions to gather and analyze system metrics from production systems, identifying performance bottlenecks through Application Performance Management (APM), and resolving these issues. Automation will be a key focus of your work, as you will automate tasks to enhance efficiency and reduce manual effort. Collaboration with Application Engineering teams and other Site Reliability Engineers (SREs) will be vital to ensure the reliability and scalability of our systems. You will also have the opportunity to learn and utilize tools like Terraform and Ansible for provisioning and managing infrastructure.