Xeroposted about 2 months ago
$185,000 - $201,700/Yr
San Mateo, CA
Publishing Industries

About the position

At Xero, we're here to help you supercharge your business. We do this by automating routine tasks, surfacing actionable insights and connecting businesses with the right data, advisors and apps. When that happens, we're not only making life better for small business, we'll be building a stronger economy that can change the world. The Xero Chaos Engineering Team is a part of the Site Reliability Engineering organization and is responsible for constantly tuning the operational readiness and efficiency of Xero services. The team is responsible for driving enduring reliability at Xero and is focused on improving system resilience by intentionally introducing controlled disruptions of failures into a system to identify weaknesses and vulnerabilities in both pre-production and production environments. The goal is to identify weaknesses before they become outages.

Responsibilities

  • Design and implement chaos experiments to identify weaknesses in system architecture and improve overall reliability.
  • Collaborate with cross-functional teams to develop strategies that enhance system resilience and ensure optimal performance in production environments.
  • Design and build a failure mode and chaos engineering environment that allows for repeatable and scalable testing.
  • Execute chaos experiments to simulate various failure scenarios.
  • Develop and maintain chaos engineering frameworks and tools.
  • Collaborate with development and operations teams to implement improvements based on experiment results.
  • Monitor system health and performance metrics to assess the impact of chaos experiments.
  • Educate team members on chaos engineering principles and best practices.
  • Analyze system behavior during experiments and document findings.
  • Continuously improve chaos engineering process and methodologies.

Requirements

  • Proficient in programming languages such as Python, Go, Java, C#, C+, .NET for automation and tool development.
  • Experienced in using chaos engineering tools like Gremlin, Chaos Monkey or Litmus.
  • Excellent analytical skills to assess system performance and identify weaknesses.
  • Effective communication skills to collaborate with cross-functional teams and convey complex concepts.
  • Leadership abilities to drive chaos engineering initiatives and foster a culture of resilience.
  • Knowledge of cloud platforms (e.g., AWS, Azure, GCP) and container orchestration (e.g., Kubernetes).
  • Familiarity with monitoring and observability tools to track system health and performance metrics.

Benefits

  • Generous paid leave to use however you'd like (plus statutory holidays).
  • Dedicated paid leave to care for your physical and mental wellbeing.
  • Employee Assistance Program to access mental health care for you and your family.
  • Health insurance, life insurance, and income protection.
  • Wellbeing and sports programmes.
  • Employee resource groups.
  • 26 weeks of paid parental leave for primary caregivers.
  • Employee Share Plan.
  • Beautiful offices with snacks and break areas.
  • Flexible working.
  • Career development.

Job Keywords

Hard Skills
  • C
  • Chaos Monkey
  • Go
  • Kubernetes
  • Python
  • 0KYiLzE pTqdlRM YeGwDvh
  • 6M9Bkn qsEVhi8zMZvt
  • A9tDvzP fQzXy0BD
  • aLxjDco1M lZrVFk4UDLP ks7d9Ntyj
  • AW4b6nD8BLNS qYsvWVE5fU
  • ceM3Roiy RbVt84CEuQiq
  • E5dSZCRW3 eIq896U
  • Ej2DhMQ kZX5blA7
  • gcKA9
  • hena8sxdXHLEDwfuK I1wmfAoeEcv8an
  • Hw0kPZ SrNXvY2At eYkrmx8aMLX
  • joRHUY JFezPTd89f5A
  • lg0VYIkF WJIvwtPMZnCc0
  • nQ50w3a zl1tHPG63WX
  • on72kU kUYscQW8ZFVX
  • P0x2TrpaR6Hh zsG7qXWiAVhC
  • PN06GaD CRuUx9zF0TtB
  • rh1LPMCKNkpezAH
  • RTUzLlIX82ya nMINbOS
  • S5zfgYMs97tHx6aBh XHufl8I5de4SnB
  • vcRqO
  • whXIjQYHLO IJUZF90kX5eOHC
  • YETf3DdOixbK k1tsbyOc
Soft Skills
  • rLo4VTBKcsx TCdrXxL
Build your resume with AI

A Smarter and Faster Way to Build Your Resume

Go to AI Resume Builder
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service