Associate SRE

$71,300 - $88,000/Yr

Evolent Health - Atlanta, GA

posted 3 months ago

Full-time - Entry Level
Atlanta, GA
Professional, Scientific, and Technical Services

About the position

As an Associate Site Reliability Engineer at Evolent, you will play a crucial role in managing our extensive application suite and cloud infrastructure. This position is part of the Platform Engineering organization, where you will be instrumental in transforming how we manage cloud infrastructure and application reliability. Your contributions will directly impact our ability to provide high-quality care to individuals with complex health conditions. We are looking for someone who is eager to join a talented team and is passionate about improving application reliability and performance. In this role, you will be responsible for identifying and implementing solutions for recurring application problems, which is essential for increasing application reliability. You will execute corrective actions identified during post-incident reviews (PIRs) or root cause analyses (RCAs) and participate in incident management, including after-hours support. Your expertise will help maintain observability solutions that gather and analyze system metrics in production systems, allowing us to identify and resolve performance bottlenecks effectively. Automation is a key focus of this position, and you will be expected to automate tasks to improve efficiency and reduce manual effort. Collaboration is also vital, as you will work closely with Application Engineering teams and other Site Reliability Engineers (SREs) to ensure the reliability and scalability of our systems. Additionally, you will have the opportunity to learn and utilize tools like Terraform and Ansible to provision and manage our infrastructure, further enhancing your skill set and contributing to our team's success.

Responsibilities

  • Own finding and implementing solutions for recurring application problems to increase application reliability.
  • Execute all corrective actions identified as part of post-incident reviews (PIRs) or root cause analyses (RCAs).
  • Participate in incident management and after-hours support.
  • Maintain observability solutions to gather and analyze system metrics into production systems.
  • Identify performance bottlenecks as part of APM and resolve each.
  • Automate tasks to improve efficiency and reduce manual effort.
  • Collaborate with Application Engineering teams and SREs to ensure the reliability and scalability of systems.
  • Learn and use Terraform and Ansible to provision and manage infrastructure.

Requirements

  • 2+ years of hands-on Azure experience and 3+ years of overall cloud-native experience.
  • DevOps/SRE mindset with the ability to identify and automate opportunities.
  • Expertise in at least one scripting language: PowerShell, YAML, HCL, or Python.
  • Expertise in at least one APM tool: Prometheus, Dynatrace, Application Insight, or DataDog.
  • Experience leveraging agile methodology (i.e., Scrumban) to manage project work.
  • Excellent communication skills and comfort with a high level of transparency.
  • Willingness to learn new technologies.
  • Team player.

Benefits

  • Comprehensive health insurance benefits
  • Bonus component based on pre-defined performance factors
  • Work/life balance and flexibility
  • Autonomy in work
  • Diversity and inclusion initiatives
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service