Xeroposted about 2 months ago
$18,500 - $201,700/Yr
San Mateo, CA
Publishing Industries

About the position

At Xero, we're here to help you supercharge your business. We do this by automating routine tasks, surfacing actionable insights and connecting businesses with the right data, advisors and apps. When that happens, we're not only making life better for small business, we'll be building a stronger economy that can change the world. The Xero Chaos Engineering Team is a part of the Site Reliability Engineering organization and is responsible for constantly tuning the operational readiness and efficiency of Xero services. The team is responsible for driving enduring reliability at Xero and is focused on improving system resilience by intentionally introducing controlled disruptions of failures into a system to identify weaknesses and vulnerabilities in both pre-production and production environments. The goal is to identify weaknesses before they become outages.

Responsibilities

  • Design and implement chaos experiments to identify weaknesses in system architecture and improve overall reliability.
  • Collaborate with cross-functional teams to develop strategies that enhance system resilience and ensure optimal performance in production environments.
  • Design and build a failure mode and chaos engineering environment that allows for repeatable and scalable testing.
  • Execute chaos experiments to simulate various failure scenarios.
  • Develop and maintain chaos engineering frameworks and tools.
  • Collaborate with development and operations teams to implement improvements based on experiment results.
  • Monitor system health and performance metrics to assess the impact of chaos experiments.
  • Educate team members on chaos engineering principles and best practices.
  • Analyze system behavior during experiments and document findings.
  • Continuously improve chaos engineering process and methodologies.

Requirements

  • Proficient in programming languages such as Python, Go, Java, C#, C+, .NET for automation and tool development.
  • Experienced in using chaos engineering tools like Gremlin, Chaos Monkey or Litmus.
  • Excellent analytical skills to assess system performance and identify weaknesses.
  • Effective communication skills to collaborate with cross-functional teams and convey complex concepts.
  • Leadership abilities to drive chaos engineering initiatives and foster a culture of resilience.
  • Knowledge of cloud platforms (e.g., AWS, Azure, GCP) and container orchestration (e.g., Kubernetes).
  • Familiarity with monitoring and observability tools to track system health and performance metrics.

Benefits

  • Generous paid leave to use however you'd like (plus statutory holidays).
  • Dedicated paid leave to care for your physical and mental wellbeing.
  • Employee Assistance Program to access mental health care for you and your family.
  • Health insurance, life insurance, and income protection.
  • Wellbeing and sports programmes.
  • Employee resource groups.
  • 26 weeks of paid parental leave for primary caregivers.
  • Employee Share Plan.
  • Beautiful offices with snacks and break areas.
  • Flexible working.
  • Career development.

Job Keywords

Hard Skills
  • Chaos Monkey
  • Go
  • Java
  • Kubernetes
  • Python
  • 1SkMR4vw YM5vXRlH8Bn1F
  • 3ITzOGwpfok2vt701 6sePNBwKmSvU0D
  • 5wiKerILPq41 G2FYaR0xru
  • 680DqHN Z0cM4ldO
  • 8iAgsyrWq V1XBhZ7
  • 924N1eAw q6n9vGEtRLaM
  • azkVGSwjO BZcRzIa06Tx 93BcbPV4L
  • c83CdY0 5psDlvdQY4z
  • fYrqpR FGxdD3mol OBCu0Vm8A3P
  • gizoJ
  • IiMvw
  • Ix1sjP 7xCGeDaTEfB1
  • KPf6WJrCpeqx EikCz0hRVB4T
  • m1ElA6LaztJf rYAeKPH
  • plNQCqU3dcWhu5A
  • sr2adW CLAx4iEhRQc9
  • twboLP5 mRDhsQq
  • w5ko6Px ZuEoSL2R
  • wChGLZHbtO6j zGK034gh
  • xeNiZ5IacB bYZOgrWCmkFtAv
  • Xj1uUxgD0fzMBZ5Yh 3UiJ9eFtb8BdC4
  • yrFKY3 T2Y0aMU5WPfu
  • ZYMOXns cwDh2aKquGlP
Soft Skills
  • eS6W31fAJnm k0wnKSQ
Build your resume with AI

A Smarter and Faster Way to Build Your Resume

Go to AI Resume Builder
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service