Wex Health - Saint Louis, MO

posted 15 days ago

Full-time - Senior
Saint Louis, MO
5,001-10,000 employees
Insurance Carriers and Related Activities

About the position

The Senior Staff Site Reliability Engineer (SRE) at WEX will play a crucial role in enhancing the reliability and performance of the company's Benefits systems. This position involves leading technical initiatives, mentoring other engineers, and ensuring that systems are designed for high availability, scalability, and security. The SRE will collaborate with various engineering teams to implement best practices in observability, incident response, and operational excellence, ultimately improving the quality of service provided to customers and internal stakeholders.

Responsibilities

  • Provide technical guidance and mentorship to other SREs and engineers.
  • Lead the design and implementation of complex systems and solutions.
  • Drive the adoption of SRE best practices across the organization.
  • Architect and implement highly available, scalable, and fault-tolerant systems.
  • Optimize system performance and resource utilization.
  • Proactively identify and mitigate risks to system reliability.
  • Lead incident response efforts, driving efficient resolution and post-incident analysis.
  • Develop and implement processes to improve incident response capabilities.
  • Design and develop automation tools to streamline operational tasks, improve system reliability, and reduce toil.
  • Utilize monitoring and observability tools to gain deep insights into system behavior.
  • Work closely with development teams to ensure software design meets operational requirements.
  • Foster a culture of collaboration and knowledge sharing across teams.
  • Forecast future capacity needs and implement strategies to ensure systems scale efficiently.
  • Continuously identify performance bottlenecks and lead efforts to optimize system performance.
  • Champion security best practices and ensure that systems are designed and operated in compliance with industry standards and regulations.
  • Stay current with emerging technologies and industry trends.

Requirements

  • 7+ years of hands-on experience as a Site Reliability Engineer or equivalent role.
  • 7+ years of development experience with at least one major programming language.
  • Expert-level knowledge of Cloud Computing platforms (AWS and Azure).
  • Proven ability to lead complex technical projects and initiatives.
  • Strong communication and collaboration skills, with the ability to influence and build consensus.
  • Deep understanding of observability, logging, and monitoring technologies.
  • Experience with a variety of RDBMS and NoSQL data stores.
  • Expertise in containerization technologies such as Docker and Kubernetes.
  • Expertise in infrastructure as code.
  • Experience designing and building RESTful APIs.
  • Extensive hands-on experience with (Datadog, Splunk, or other tooling).
  • Familiarity with Agile methodologies and practices.
  • Extensive experience in providing and leading critical application support in a 24/7/365 high-availability environment.
  • Experience with GitOps.
  • BA/BS degree in Computer Science or related technical field, or equivalent job experience.

Benefits

  • Health insurance
  • Dental insurance
  • Vision insurance
  • Retirement savings plan
  • Paid time off
  • Health savings account
  • Flexible spending accounts
  • Life insurance
  • Disability insurance
  • Tuition reimbursement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service