Check Point Software Technologies - Seattle, WA

posted 3 months ago

Full-time - Mid Level
Seattle, WA
Professional, Scientific, and Technical Services

About the position

As a Site Reliability Engineer (SRE) at Check Point Software Technologies, you will play a crucial role in ensuring the reliability and performance of our security services. This position involves investigating complex production issues, enhancing system resilience, and expanding our monitoring coverage. You will collaborate with customer-facing teams to support technical investigations and implementations, while also automating processes to reduce workload and improve system stability. Your expertise will contribute to maintaining a high level of service for our customers, ensuring that we meet their real-time needs in the ever-evolving landscape of cyber security. In this role, you will lead investigations into cross-functional production issues, working closely with other experts to identify root causes and implement effective solutions. You will be responsible for maintaining 100% monitoring coverage, developing strategies that focus on alerting for symptoms rather than just outages. Your efforts will help reduce workload and improve uptime, as well as enhance SLA response times through automation of production issue management. Additionally, you will act as the R&D extension in North America, providing support for critical production issues during business hours. Your advanced troubleshooting skills will be essential in resolving complex network problems and recurring platform issues. You will also support Account Managers and the Customer Success team with complex and strategic implementations, ensuring that our infrastructure can grow and adapt to meet customer demands.

Responsibilities

  • Lead investigation and collaborate with other group experts to investigate complex cross-function production issues
  • Maintain 100% Monitoring coverage, including building monitoring strategy that alerts on symptoms rather than on outages
  • Reduce workload and improve uptime and SLA response time by implementing automation processes for production issues
  • Act as the R&D extension in North America supporting production critical issues during North American business hours
  • Perform advanced troubleshooting of complex network problems and recurring platform issues
  • Support Account Managers and Customer Success team with complex implementations/strategic implementations
  • Design, build, and maintain core infrastructure that enables growth

Requirements

  • Strong Experience with AWS
  • Strong Experience with observability and monitoring systems (Datadog, Prometheus, Grafana, etc.) Including building and designing advance monitoring
  • Working experience in large-scale network and system engineering environments (ISP, Cloud Providers)
  • Experience with Linux system administration
  • Experience with networking technologies and protocols (TCP/IP, LAN, NAT, BGP, VPN, DNS, iSCSI)
  • Experience with Configuration Management and IaC tools (Ansible, Terraform)
  • Experience with coding complex automation and runbooks
  • Good familiarity with virtualization environments (Proxmox, OpenStack)
  • Scripting experience with Bash, Python, or similar
  • Proficiency with virtualized and containerized environments (ECS / Kubernetes)
  • Experience with Hashicorp tools (Consul, Vault, Nomad) - An advantage
  • Proven network debugging and problem-solving skills
  • Must be eligible to work in the United States without sponsorship now or in the future

Nice-to-haves

  • Experience with Hashicorp tools (Consul, Vault, Nomad) - An advantage

Benefits

  • Healthcare benefits
  • 401(k) plan and company match
  • Short-term and long-term disability coverage
  • Basic life insurance
  • Stock awards
  • Employee stock purchasing plan
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service