Principal Site Reliability Engineer (Northridge, CA)

$116,000 - $174,000/Yr

MIN Medtronic Minimed Inc. - Los Angeles, CA

posted 17 days ago

Full-time - Senior

Los Angeles, CA

About the position

The Principal Site Reliability Engineer (SRE) at Medtronic is responsible for ensuring the health, performance, and reliability of mission-critical applications and systems, including SAP, Salesforce, Jira, and Confluence. This role merges software engineering with IT operations to facilitate seamless deployment, stability, and scalability of systems. The SRE will collaborate with cross-functional teams to design infrastructure solutions, develop automation strategies, and drive continuous improvement initiatives, with a strong focus on security, compliance, and business continuity.

Responsibilities

Design, implement, and maintain scalable, highly available systems architecture for critical business applications like SAP, Salesforce, Jira, and Confluence.
Proactively monitor and manage performance, availability, and capacity while implementing automation strategies to streamline processes and reduce manual efforts.
Collaborate with development and IT teams for seamless deployments, configuration changes, and infrastructure upgrades, ensuring alignment with business needs.
Conduct root cause analysis, resolve complex technical issues, and implement SRE best practices, focusing on service-level objectives (SLOs) and indicators (SLIs).
Manage and optimize cloud infrastructure (including hybrid and multi-cloud environments) to ensure cost efficiency, performance, and scalability.
Lead disaster recovery planning, business continuity initiatives, and capacity planning to maintain system resilience and availability.
Implement security and compliance best practices, monitoring for vulnerabilities, and ensuring adherence to industry standards (e.g., SOX, GDPR, HIPAA).
Drive change and release management processes, planning and coordinating production deployments, and optimizing workflows.
Serve as a technical expert in enterprise applications, developing comprehensive documentation, and mentoring junior team members.
Collaborate with stakeholders to prioritize projects, align SRE initiatives with business goals, and provide technical leadership across the organization.

Requirements

7+ years of experience with a bachelor's degree or 5+ years of experience with an advanced degree or 12+ years of experience with a high school diploma or equivalent.
In-depth knowledge of enterprise application systems, including SAP, Salesforce, Jira, and Confluence.
Advanced experience with cloud platforms (AWS, Azure, or GCP) and cloud-native services, such as container orchestration (Kubernetes) and serverless computing.
Strong proficiency in scripting languages such as Python, Bash, or PowerShell for automating infrastructure and application management tasks.
Solid understanding of CI/CD pipelines, GitOps, and Infrastructure as Code using tools like Jenkins, GitHub Actions, Terraform, or CloudFormation.
Experience with monitoring, logging, and observability tools such as Prometheus, Grafana, ELK Stack, or Splunk.
Familiarity with IT service management (ITSM) practices and tools like ServiceNow.
Excellent troubleshooting and analytical skills with the ability to navigate complex technical environments.
Strong communication and collaboration skills, with the ability to work effectively across multiple teams and stakeholders.
Ability to manage multiple priorities in a fast-paced, dynamic environment with a focus on delivering value and driving results.
Knowledge of disaster recovery planning, capacity planning, and performance optimization strategies.
Familiarity with ITIL framework and experience implementing IT service management principles.

Nice-to-haves

Bachelor's degree in Computer Science, Information Technology, or a related field.
10+ years of experience in IT infrastructure and operations roles, with at least 5 years in a Site Reliability Engineering or DevOps position.
Proven track record of supporting and managing enterprise applications such as SAP, Salesforce, Jira, and Confluence.
Extensive experience with cloud platforms (AWS, Azure, or GCP) and associated automation tools.
Strong proficiency in scripting/programming languages such as Python, Bash, or JavaScript.
Experience implementing and managing CI/CD pipelines and infrastructure as code.
Deep understanding of monitoring and observability principles and experience with industry-standard tools.
Prior experience in incident management, problem resolution, and root cause analysis in large-scale environments.
Experience with hybrid or multi-cloud environments and cloud infrastructure optimization.
Understanding of security practices, compliance requirements, and vulnerability management.
Certifications such as AWS Certified Solutions Architect, Google Professional Cloud Architect, or Azure Solutions Architect Expert.
Experience with additional enterprise applications, such as ServiceNow, SAP HANA, or Oracle ERP.
Knowledge of security practices, including identity and access management, data encryption, and vulnerability management.
Experience with agile methodologies and tools for project management and software delivery (e.g., Scrum, Kanban).
Familiarity with DevSecOps principles and practices, integrating security into the CI/CD pipeline and automating security testing.
Previous experience in a leadership role, managing a team of engineers or SRE professionals.
ITIL certification and experience with IT service management frameworks.
Strong leadership and mentorship abilities, with experience guiding teams in best practices and technical expertise.

Benefits

Competitive Salary
Flexible Benefits Package
Short-term incentive called the Medtronic Incentive Plan (MIP)

Principal Site Reliability Engineer (Northridge, CA)

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company