Site Reliability Engineer

$111,300 - $156,500/Yr

NetApp

posted 4 months ago

Full-time - Mid Level
Computer and Electronic Product Manufacturing

About the position

NetApp's Engineering Tools and Services (ETS) organization is responsible for the automation and testing infrastructure utilized by our product development and QA teams. We are looking for a Site Reliability Engineer to join our globally distributed SRE team that supports highly available internal services critical to NetApp's product quality and delivery. A strong candidate would have fundamental expertise with Systems Engineering and Software Engineering practices and principles as well as an understanding of how SRE can be applied to increase service availability. The Site Reliability Engineer will play a crucial role in ensuring the reliability and performance of our services, working closely with development and operations teams to implement best practices in service management and incident response. This position requires a proactive approach to problem-solving and a commitment to continuous improvement in service delivery.

Responsibilities

  • Respond to service outages quickly and document detailed root cause analysis for outage incidents.
  • Contribute to the team's monitoring and alerting strategy by reacting to existing alerts and creating new ones to reduce MTTR for supported services.
  • Enforce SLOs and monitor as well as create new SLIs to ensure service reliability.
  • Collaborate with development teams to improve service availability and performance.
  • Utilize Atlassian tools (Jira, Confluence, Bitbucket) for project management and documentation.
  • Manage source code using tools like GitHub, Bitbucket, and Perforce.

Requirements

  • Experience and proficiency with Linux/Unix environments.
  • Programming experience in Python, Go, Perl, Java, C, C++, with shell scripting being desirable.
  • Basic proficiency with Relational Databases such as MySQL, MariaDB, PostgreSQL.
  • Experience with incident response and documenting root cause analysis for outages.
  • Comfortable with interrupt-driven workflows.
  • A minimum of 2-5 years' experience in a related field.
  • A bachelor's degree in computer science, Computer Engineering, Information Systems, a master's degree; or equivalent experience.

Benefits

  • Health Insurance
  • Life Insurance
  • Retirement or Pension Plans
  • Paid Time Off (PTO)
  • Various Leave options
  • Performance-Based Incentives
  • Employee stock purchase plan
  • Restricted stocks (RSUs)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service