Staff Site Reliability Engineer

Abbott Laboratoriesposted about 1 month ago

$112,000 - $224,000/Yr

Full-time • Senior

Pleasanton, CA

Miscellaneous Manufacturing

Match Score

Add your resume to Teal and unlock your Job Match score for free

Add Resume Bookmark with Teal

About the position

As a senior member of Site Reliability Engineering, you will play a critical role in establishing and executing a site reliability strategy for the Heart Failure Division Medical Device Mobile and Cloud Digital Software portfolio (both Class II and Class III). You will partner with and influence our Architecture and Engineering teams in delivering highly resilient software solutions for our customers. You will be responsible for implementing SRE improvement processes, procedures and influencing change within the organization. You will need a strong software engineering background in a highly secured environment and have DevOps or formal test automation, load testing or SRE experience. You will need extensive technical knowledge in the development, delivery, and implementation of highly complex and critical software systems. Expertise in the value and principles of SRE (SLI/SLO, Error Budgets, Toil, Observability, Release Engineering) is critical for success in this role. You will have demonstrated the ability to develop, communicate and execute your vision resulting in the adoption of practices and tooling that will help strengthen our position as the premier leader in the Heart Failure business.

Responsibilities

Develop and enable a culture of SRE in our software development, delivery and operational practices.
Implement a comprehensive SRE strategy and roadmap in partnership with the HF Digital team.
Identify critical KPI's and Metrics and execute on the SRE roadmap.
Help software engineering teams and business stakeholders establish and evolve reliability goals and measure progress against those goals using SLIs/SLOs.
Automate manual processes via scripting and/or tools. Create automated regression & sustaining test suite and manage continuous execution of the tests.
Work closely with Development, Network, Infrastructure and Requirements team on critical issues by evaluating the problem, proposing and implementing resolutions and partnering with other teams as needed for the resolution.
Evaluate the current tiers of service of our applications, reliability standards and practices to define steps to continuously improve on them.
Participate in blameless postmortems on critical incidents and help teams use their learnings to better predict, detect and prevent future issues.
Experience with building robust continuous integration and continuous delivery (CI/CD) pipelines using the tools on-premises as well as on cloud.
Partner with the customer support team for rapid resolution of issues.
Evaluate and monitor social signals for site reliability. Monitor app store comments and other social channels.
Build a practice of rapid detection and root cause determination while keeping stakeholders informed.

Requirements

Bachelor's Degree in computer science, Information Technology or relevant field or an equivalent combination of education and work experience.
Masters Degree in a technical discipline.
Minimum 12+ years of experience.
Minimum 10 years of experience developing large-scale digital software systems.
Exposure to cloud development and deployment technologies, including containerization, infrastructure as code, and multi-cloud configurations.
Deep understanding of DevOps and SRE Best Practices.
Hands-on experience with any of the two CI/CD tools: Jenkins, Azure DevOps, Helm, Chef, Terraform.
Hands-on experience with Kubernetes, Container Orchestrations, Docker, and Cloud Native applications.
Hands-on experience with Managing application configuration using configuration management tools like Ansible, Chef, Azure App Configuration.
Hands-on experience with Grafana, JMeter, or similar performance monitoring tools.
Experience with Creating, maintaining and Managing GIT source code repositories like BitBucket or GitHub.
Strong scripting skills in languages such as Shell, Python, Ruby, Golang, Java.
Experience with GIT, Jira, Confluence and similar issues tracking and collaboration tools.

Benefits

Career development with an international company.
Free medical coverage in our Health Investment Plan (HIP) PPO medical plan in the next calendar year.
Excellent retirement savings plan with high employer contribution.
Tuition reimbursement, the Freedom 2 Save student debt program and FreeU education benefit.
Training and career development, with onboarding programs for new employees and tuition assistance.
Financial security through competitive compensation, incentives and retirement plans.
Health care and well-being programs including medical, dental, vision, wellness and occupational health programs.
Paid time off.
401(k) retirement savings with a generous company match.