Insulet Corporation

posted 7 days ago

Full-time - Senior
Remote
Miscellaneous Manufacturing

About the position

The Senior Principal Site Reliability Engineer (SRE) at Insulet is a pivotal role focused on ensuring the reliability, scalability, and performance of critical systems and services. This position involves leading the adoption of SRE practices, developing automation tools, and collaborating with cross-functional teams to enhance system reliability and operational efficiency. The ideal candidate will have a strong software engineering background and a passion for automation, with responsibilities that include incident response, performance analysis, and mentoring junior engineers.

Responsibilities

  • Lead the adoption and implementation of SRE practices across the organization, promoting a culture of reliability and continuous improvement.
  • Develop and implement automation tools and frameworks to enhance system reliability and operational efficiency.
  • Design and maintain comprehensive monitoring and alerting systems to ensure the health and performance of applications and infrastructure.
  • Lead the response to high-severity incidents, conduct root cause analysis, and implement corrective actions to prevent recurrence.
  • Analyze system performance and reliability data to identify areas for improvement and implement optimization strategies.
  • Work closely with development, operations, and product teams to ensure seamless integration of SRE practices and to drive reliability improvements.
  • Mentor and train junior engineers in SRE best practices, develop a culture of knowledge sharing and continuous learning.
  • Conduct capacity planning and demand forecasting to ensure systems can handle future growth and spikes.
  • Maintain detailed documentation of SRE processes, tools, and best practices to ensure knowledge continuity and operational excellence.

Requirements

  • Experience with observability tools such as Datadog, Prometheus, Dynatrace, Grafana, ELK Stack, or similar.
  • Proficiency in programming languages such as Python, Go, or Java.
  • Strong understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and container orchestration technologies (e.g., Docker, Kubernetes).
  • In-depth knowledge of AWS services including VPC, Lambda, IAM, ELB, EC2, ECS, CloudWatch, API Gateway, S3, SQS, SNS, WAF and Route53.
  • Experience with infrastructure as code tools such as Terraform, Ansible, or similar.
  • Excellent troubleshooting and problem-solving skills.
  • Strong communication and leadership skills, with the ability to collaborate effectively with cross-functional teams.
  • Experience leading and mentoring engineering teams is highly desirable.
  • Knowledge of security best practices and experience implementing security controls and measures.
  • Experience with chaos engineering and resilience testing.
  • Familiarity with AI/ML applications in operational processes.

Nice-to-haves

  • Experience with chaos engineering and resilience testing.
  • Familiarity with AI/ML applications in operational processes.

Benefits

  • 100% remote working arrangements available (may work from home/virtually 100%; may also work hybrid on-site/virtual as desired).
  • Competitive salary range of $163,700.00 - $246,050.00 based on role, level, and location.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service