Site Reliability Engineer

$72,100 - $158,620/Yr

CVS Health - Woonsocket, RI

posted 3 months ago

Full-time - Mid Level
Woonsocket, RI
Health and Personal Care Retailers

About the position

The Site Reliability Engineer (SRE) position at CVS Health is a critical role within the PCW Pharmacy Technology Site Reliability Engineering team. This position focuses on enhancing the reliability and stability of the application portfolio, ensuring seamless experiences for consumers. The ideal candidate will be a highly technical visionary, dedicated to continuous improvement through automation, performance enhancements, and innovation. The SRE will work closely with various teams, including Product, Engineering, Infrastructure, and Service Management, to influence key decisions and establish strong partnerships. In this role, the SRE will be responsible for identifying, maintaining, and managing Service Level Objectives (SLOs), Service Level Indicators (SLIs), and operational Key Performance Indicators (KPIs). A proactive approach is essential, as the SRE will review the existing environment and engage in enhancements or new services to identify and remediate stability, reliability, and performance improvement opportunities. Continuous monitoring of system telemetry and alerting will be crucial to ensure actionable engagement by operations teams. The SRE will also be tasked with identifying and developing automation solutions to preemptively address potential problems before they lead to service interruptions. Investigating the root causes of major incidents, formulating remediation plans, and sharing knowledge across platforms will be key responsibilities. Additionally, the SRE will provide technical coaching and direction to organizational resources, stay current with emerging technologies and market trends, and frequently review capacity models to ensure production results remain within expected bounds. Ensuring that incident response processes and associated playbooks are current and effective is also a vital part of this role.

Responsibilities

  • Identify, maintain, and manage to SLOs, SLIs, and operational KPIs.
  • Establish and maintain strong partnerships with Product, Engineering, Infrastructure, and Service Management teams.
  • Proactively review the existing environment and engage on enhancements or new services to identify and remediate stability, reliability, and performance improvement opportunities.
  • Continuously review system telemetry and alerting to ensure actionable engagement by operations teams.
  • Identify and develop automation solutions to address potential problems before they result in a service interruption.
  • Investigate root causes of major incidents, identify remediation plans, and share knowledge across platforms.
  • Provide technical coaching and direction to organizational resources.
  • Stay current with emerging technologies and market trends to best position the organization.
  • Review capacity models frequently to ensure production results are within expected bounds.
  • Ensure incident response processes and associated playbooks are current and effective.

Requirements

  • 3+ years of experience in a Site Reliability Engineer or Application Operations role.
  • 2+ years of experience demonstrated scripting or developing software in languages such as Java and Python.
  • 2+ years of experience managing and improving cloud deployed services on platforms such as AKS & GCP as well as monolith systems.
  • 2+ years of experience with configuring, customizing, and extending monitoring platforms such as AppDynamics, Splunk, Grafana, ELK, or similar.

Nice-to-haves

  • Experience managing version control systems such as GIT.
  • Experience with tools such as Jenkins and Harness.
  • Continuous improvement oriented ranging from ideation to implementation.
  • Ability to engage cross-functional teams to champion the resolution of issues and design solutions.
  • Strong communication, organizational, analytical, and problem-solving skills.
  • Knowledge of IT Service Management best practices such as change management and problem management.

Benefits

  • Full range of medical, dental, and vision benefits.
  • 401(k) retirement savings plan.
  • Employee Stock Purchase Plan.
  • Fully-paid term life insurance plan.
  • Short-term and long-term disability benefits.
  • Numerous well-being programs.
  • Education assistance and free development courses.
  • CVS store discount and discount programs with participating partners.
  • Paid Time Off (PTO) or vacation pay, as well as paid holidays throughout the calendar year.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service