Xyant Services - San Jose, CA

posted 5 days ago

Full-time
San Jose, CA
Professional, Scientific, and Technical Services

About the position

The Cloud Observability Engineer (Automation) role focuses on architecting and implementing large-scale observability platforms for distributed applications in cloud environments. The position requires collaboration with diverse teams to ensure operational excellence and high-quality service delivery to customers. The engineer will be responsible for developing and maintaining observability systems, enhancing performance, and driving strategic initiatives within the organization.

Responsibilities

  • Architect and implement large-scale observability platforms for distributed applications.
  • Develop, deploy, and run distributed applications on cloud platforms.
  • Ensure high levels of uptime and Quality of Service (QoS) for customers.
  • Define service level objectives (SLOs) and service level indicators (SLIs) to measure service quality.
  • Collaborate with SRE and Engineering/Product teams on critical initiatives.
  • Solve performance and stability issues using various tools.
  • Communicate effectively across teams to drive projects to completion.
  • Contribute to technical direction and strategic decisions within the organization.

Requirements

  • 4-5+ years of production-level experience with distributed applications in public and/or private cloud.
  • B.S. degree in Computer Science or a related technical field.
  • Programming experience with languages like Go, Python, and Java.
  • Experience with UI technologies such as Javascript, React, and Backstage.
  • Experience with observability and tooling systems like Splunk, Prometheus, GitHub, Jenkins, and Artifactory.
  • Experience with container platforms like Kubernetes.
  • Knowledge of cloud environments and their operational characteristics.
  • Experience designing systems for fault tolerance, scalability, and stability.

Nice-to-haves

  • Experience with observability tools like Grafana, Cortex, Tempo, and Jaeger.
  • Familiarity with Open-Source products/community like OpenTelemetry.
  • Knowledge of cloud security and automation concepts and practices.
  • Experience promoting the DevOps/SRE approach.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service