Palo Alto Networks - Santa Clara, CA

posted 2 months ago

Full-time - Principal
Santa Clara, CA
Professional, Scientific, and Technical Services

About the position

The Principal Site Reliability Engineer at Palo Alto Networks will enhance the Advanced Threat Prevention (ATP) Infrastructure team by developing mission-critical platforms, tools, and processes to ensure high availability and reliability of applications. This role requires innovative problem-solving skills and a deep understanding of cloud infrastructure, particularly within Google Cloud Platform (GCP). The engineer will collaborate with developers to improve service usability and automate infrastructure management.

Responsibilities

  • Write automation code for provisioning and operating infrastructure at massive scale
  • Design, build and operate Cloud infrastructure for reliable and rapid deployment of microservices
  • Work with development teams to ensure applications are production ready, scalable, and reliable
  • Identify and drive opportunities to improve automation for code deployment and management
  • Develop tools and frameworks to automate operational tasks and deployment of services
  • Establish end-to-end monitoring and alerting on critical application components
  • Participate in on-call rotation supporting the platform and production applications
  • Direct root cause analysis of critical business and production issues
  • Develop and mentor other SREs on standard methodologies
  • Represent SRE in design reviews and collaborate with Engineering teams on operational readiness

Requirements

  • BS or MS in Computer Science or related field, or equivalent professional/military experience
  • Expertise in configuration management frameworks such as Terraform, Ansible, and Helm
  • Strong experience with Kubernetes
  • Strong Linux administration and network troubleshooting skills
  • Expertise in Google Cloud Platform (GCP) and resource management
  • Proficiency in programming languages like Python and shell scripting
  • Strong experience with CI/CD pipelines, GitHub, Jenkins, and Artifactory
  • Experience with metrics and monitoring tools such as Grafana and Prometheus
  • Ability to diagnose and troubleshoot complex distributed systems
  • Strong fundamentals in API gateways like Nginx or Envoy
  • Experience with cloud infrastructure performance and cost optimizations
  • Experience with AWS is a plus
  • Excellent interpersonal skills and teamwork ability
  • Passionate about learning new technology stacks
  • Experience in building and managing large relational database clusters (MySQL/Percona) is a plus

Nice-to-haves

  • Experience with AWS
  • Experience in building and managing large relational database clusters (MySQL/Percona)

Benefits

  • FLEXBenefits wellbeing spending account
  • Mental and financial health resources
  • Personalized learning opportunities
  • Restricted stock units
  • Bonus opportunities
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service