Principal Site Reliability Engineer (CDSS Advanced URL Filtering)

Palo Alto Networks - Santa Clara, CA

posted 22 days ago

Full-time - Principal

Onsite - Santa Clara, CA

Professional, Scientific, and Technical Services

Track Jobs with Teal

Apply

Add to tracker

Apply

About the position

The Principal Site Reliability Engineer at Palo Alto Networks will play a crucial role in enhancing the reliability and scalability of the company's systems, particularly within the CDSS Advanced URL Filtering team. This position involves optimizing infrastructure costs, defining service-level objectives, and collaborating with cross-functional teams to ensure high availability of applications. The role requires a strong focus on automation, monitoring, and continuous improvement to support the company's mission of protecting the digital way of life.

Responsibilities

Optimize infrastructure costs by monitoring resource utilization, rightsizing instances, and reducing waste to improve cost-efficiency.
Define and manage service-level objectives (SLOs) and related metrics to ensure service reliability and align with business goals.
Design and maintain secure cloud infrastructure that prioritizes reliability, scalability, and efficiency.
Develop expertise in new technologies to enhance infrastructure and operations.
Collaborate with cross-functional teams to ensure applications are production-ready and highly available.
Automate deployments, monitoring, and alerting to streamline operations and improve reliability.
Diagnose and resolve critical issues, driving optimization and continuous improvement.
Participate in on-call rotations to support seamless service operations.
Contribute to design reviews to enhance system performance and scalability.

Requirements

Expertise in provisioning and managing cloud infrastructure on public or private cloud platforms (GCP, AWS, or Azure preferred).
Strong proficiency in tools like Kubernetes, Terraform, and Ansible.
Proficiency in managing and optimizing SQL and NoSQL databases, including operational tasks such as provisioning, scaling, monitoring, backups, and troubleshooting.
Deep understanding of distributed systems, high-availability architecture, and strategies for scaling and optimizing system performance.
Proven experience defining and managing SLAs, SLOs, and SLIs to ensure service reliability and business alignment.
Expertise in monitoring and optimizing cloud infrastructure costs, including resource allocation and implementing efficient practices.
Hands-on experience with Envoy or similar load balancing technologies, along with strong Linux system administration and advanced network troubleshooting skills.
Advanced skills in programming and automation using Python, Golang, or shell scripting.
Proven experience managing production deployments, ensuring system stability, and enforcing DevOps best practices.
Familiarity with CI/CD pipelines (GitLab CI preferred) and expertise in designing robust monitoring and alerting systems.
Exceptional ability to work with cross-functional teams, communicate effectively, and provide technical leadership.
BS/MS in Computer Science, Computer Engineering, or a related field, with 8+ years of hands-on industry experience in Site Reliability Engineering or a similar role.

Nice-to-haves

Experience with platforms like BigQuery, MongoDB, Cloud SQL, Firestore, Bigtable, and MySQL.
Strong communication skills and a drive to make a meaningful impact.

Benefits

FLEXBenefits wellbeing spending account with over 1,000 eligible items.
Mental and financial health resources.
Personalized learning opportunities.

Track Jobs with Teal

Apply

Add to tracker

Apply

Match and compare your resume to any job description

Start Matching

Principal Site Reliability Engineer (CDSS Advanced URL Filtering)

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company