Zscaler - San Jose, CA

posted 3 months ago

Full-time - Mid Level
Remote - San Jose, CA
Professional, Scientific, and Technical Services

About the position

As a Site Reliability Engineer at Zscaler, you will play a crucial role in ensuring the reliability and performance of our cloud security platform. This position requires a proactive approach to monitoring and maintaining the health of our applications and services. You will collaborate closely with both the Software Engineering and Infrastructure teams to design, implement, and deploy comprehensive end-to-end monitoring solutions that enhance our operational capabilities. Your responsibilities will include managing the deployment of patches, upgrades, and administrative tools, ensuring that our systems are always up-to-date and secure. In addition to deployment management, you will be responsible for monitoring applications and services, participating in on-call rotations, and addressing any issues that arise. Your ability to troubleshoot problems effectively and communicate solutions will be essential in preventing future incidents. You will also be expected to develop strategies that enhance the resilience of our cloud platform, ensuring that we meet our high service level agreements. This role is ideal for someone with a strong background in 24/7 NOC operations and production cloud platforms, who is passionate about solving technical challenges and optimizing processes. You will be part of a dynamic team that is dedicated to building and innovating within the cloud security space, contributing to Zscaler's mission of making the cloud a safe and enjoyable place for enterprise users.

Responsibilities

  • Work with Software Engineering and Infrastructure teams to design, implement, and deploy end-to-end monitoring solutions.
  • Manage the deployment of patches, upgrades, and administrative tools/utilities.
  • Monitor applications & services, participate in on-call rotations, and address issues while developing strategies to prevent future incidents.
  • Troubleshoot problems, resolve issues, and communicate solutions.

Requirements

  • 2+ years of experience working in 24/7 NOC operations, production cloud platforms, and associated processes and automation workflows.
  • Working knowledge of Linux/UNIX and related applications.
  • Familiarity with a programming language (e.g., Python, Go) and scripting languages (e.g., Bash).
  • High-level understanding of networking standard protocols and components such as HTTP, DNS, TCP/IP, ICMP, the OSI Model, Subnetting, and Load Balancing.

Nice-to-haves

  • Passion for solving technical challenges in running a resilient cloud platform with high Service level agreement.
  • Constantly drive to improve, optimize, and identify opportunities for process improvement.

Benefits

  • Various health plans
  • Time off plans for vacation and sick time
  • Parental leave options
  • Retirement options
  • Education reimbursement
  • In-office perks, and more!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service