Evolent Health - Phoenix, AZ
posted about 2 months ago
As an Associate Site Reliability Engineer at Evolent, you will play a crucial role in managing our extensive application suite and cloud infrastructure. This position is part of the Platform Engineering organization, where you will be instrumental in transforming how we manage cloud infrastructure and application reliability. Your contributions will directly impact our ability to deliver high-quality services to our clients and ensure that our systems are reliable and scalable. You will be working in a collaborative environment, where your insights and expertise will help shape the future of our technology stack. In this role, you will take ownership of identifying and implementing solutions for recurring application problems, thereby increasing application reliability. You will execute corrective actions identified during post-incident reviews (PIRs) or root cause analyses (RCAs), ensuring that we learn from our experiences and continuously improve our processes. Your participation in incident management and after-hours support will be essential in maintaining the integrity of our systems. You will also be responsible for maintaining observability solutions that gather and analyze system metrics from our production systems. Identifying performance bottlenecks as part of Application Performance Management (APM) will be a key aspect of your role, and you will work to resolve these issues effectively. Automation will be a significant focus, as you will automate tasks to improve efficiency and reduce manual effort, allowing the team to focus on more strategic initiatives. Collaboration is vital in this position, as you will work closely with Application Engineering teams and other Site Reliability Engineers (SREs) to ensure the reliability and scalability of our systems. You will have the opportunity to learn and utilize tools such as Terraform and Ansible to provision and manage our infrastructure, further enhancing your technical skill set.