Site Reliability Engineer

University Of MichiganAnn Arbor, MI
430d$80,762 - $99,765Remote

About The Position

As a Site Reliability Engineer in the Department of Biostatistics at the University of Michigan, you will be responsible for ensuring the reliability and uptime of critical services, both internally and externally. This role involves monitoring system performance, automating cloud infrastructure, and collaborating with development teams to optimize product reliability. You will play a key role in maintaining system capacity and performance while contributing to the overall improvement of engineering tools and data security.

Requirements

  • 1-3+ years (intermediate) or 3-5+ (senior) years of experience with cloud services (AWS, Google Cloud Platform, Azure) and experience with container orchestration technologies (e.g., Kubernetes, Docker).
  • Expertise in software development in one or more programming languages (Python, Go, Java, etc.).
  • Proficient with Unix/Linux systems, with scripting experience in Shell, Perl or Python.
  • 1-3+ years (intermediate) or 3-5+ (senior) years of experience with infrastructure as code (IaC) using tools like Terraform, Ansible, or Chef.
  • Solid understanding of core internet technologies (e.g., TCP/IP, DNS, SMTP, HTTP, distributed networks), and ability to troubleshoot related issues.
  • Ability to navigate, communicate, and negotiate priorities and technical risk across teams.

Responsibilities

  • Develop, scale, and automate our cloud infrastructure with a focus on efficiency, security, and reliability.
  • Work closely with development teams to ensure that design, testing, and deployment of new products and features are optimized for reliability.
  • Monitor system performance, configure alerts, and respond to incidents.
  • Perform root cause analysis of production errors and resolve technical issues.
  • Implement automation tools for efficient server management and operation.
  • Participate in on-call rotations to handle and resolve high priority incidents.
  • Collaborate with team members to improve our engineering tools, systems, procedures, and data security.
  • Conduct systems tests for security, performance, and availability.
  • Develop and maintain documentation for key systems and processes.

Benefits

  • Generous time off, including family leave
  • A retirement plan that provides two-for-one matching contributions with immediate vesting
  • Many choices for comprehensive health insurance, dental, vision
  • Life insurance
  • Long-term disability coverage
  • Flexible spending accounts for healthcare and dependent care expenses
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service