Palo Alto Networks - Santa Clara, CA

posted 4 months ago

Full-time - Principal
Santa Clara, CA
Professional, Scientific, and Technical Services

About the position

At Palo Alto Networks, we are on a mission to be the cybersecurity partner of choice, protecting our digital way of life. Our vision is to create a world where each day is safer and more secure than the one before. We are looking for innovators who are committed to shaping the future of cybersecurity. The WildFire Team within the Content Delivered Security Service (CDSS) organization is at the forefront of this mission, delivering top-notch security services in the cloud to prevent cyberattacks. We are constantly innovating and challenging the status quo in cybersecurity, and we need a motivated Principal Site Reliability Engineer (SRE) to join our team. As a Principal SRE, you will operate production services and applications in the Google Cloud Platform (GCP) while continuously improving application deployment, monitoring, operability, and uptime of our services. You will work closely with the R&D team to develop and enhance software architecture, focusing on scalability, service reliability, cost, and performance. Your role will involve building automated tools for cloud operations, developing infrastructure-as-code, and improving our CI/CD pipeline and automation processes. You will also explore new technologies, particularly for CI/CD and continuous training for ML/AI systems, to enhance our model release processes. We value flexibility and choice in our work approach, allowing employees to work from the office three days a week while providing two days for remote work. This setup fosters collaboration and innovation, creating an environment where our engineering team can thrive and tackle the challenges of cybersecurity head-on. If you are passionate about technology, have a strong sense of responsibility for high reliability and service, and are excited about the opportunity to innovate in the cybersecurity space, we want to hear from you!

Responsibilities

  • Operate production services and applications in GCP cloud.
  • Continuously improve application deployment, monitoring, operability, and uptime of services.
  • Work closely with R&D team to develop and enhance software architecture for scalability, service reliability, cost, and performance.
  • Build automated tools for cloud operations, including automated resource provisioning and remediation of known issues.
  • Develop infrastructure-as-code for orchestrating production and development environments.
  • Improve and enhance CI/CD pipeline and automation processes.
  • Explore and research new technologies for CI/CD and continuous training for ML/AI systems.

Requirements

  • Expert level experience as a DevOps/SRE engineer.
  • High proficiency with Terraform and Ansible.
  • Proficiency with cloud operations, preferably GCP.
  • Proficiency with CI/CD and Configuration Management, preferably GitLab.
  • Proficiency with programming languages, preferably Python.
  • Experience with MLOps is a big plus.
  • Ability to operate independently and take responsibility for decisions and actions.
  • Effective communication and interpersonal skills to coordinate with cross-functional and international teams.
  • Hands-on and can-do attitude, willing to learn new technologies.
  • BS/MS in computer science/engineering or equivalent experience.

Nice-to-haves

  • Experience with machine learning operations (MLOps).
  • Familiarity with additional cloud platforms beyond GCP.

Benefits

  • Competitive salary between $170,000/yr to $225,000/yr based on qualifications and experience.
  • Restricted stock units and bonuses may be included in the compensation package.
  • Flexible work environment with the option to work remotely two days a week.
  • Comprehensive health insurance coverage.
  • Professional development opportunities and support for growth.
  • Diversity and inclusion programs.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service