Oteemo, Incposted about 2 months ago
San Diego, CA

About the position

This position will primarily focus on providing design and implementation expertise on infrastructure provisioning, management and lifecycle implementation of cloud components and services, containers and other critical concepts of DevSecOps principles.

Responsibilities

  • Design and manage monitoring solutions using Prometheus, Thanos, Grafana, and Mimir to ensure the health and performance of Kubernetes clusters and applications.
  • Implement Loki, Promtail, and OpenTelemetry to collect, process, and analyze logs and traces for debugging and forensic analysis.
  • Deploy, maintain, and optimize Kubernetes clusters, ensuring observability tools are properly integrated and configured.
  • Define SLIs, SLOs, and error budgets, develop alerting strategies using Alertmanager, and automate incident response processes.
  • Optimize observability stack for high availability in limited connectivity environments, leveraging solutions like Thanos for long-term storage and Minio for object storage.
  • Implement observability best practices in compliance with security frameworks and Kubernetes security tools such as NeuVector.
  • Automate observability deployments using Terraform, Helm, and Kubernetes Operators.
  • Work closely with DevOps, security, and platform teams to enhance system reliability and maintain comprehensive documentation.

Requirements

  • Active Secret or Top Secret Clearance.
  • Strong Kubernetes expertise in managing and monitoring clusters at scale.
  • Experience with observability stacks including Prometheus, Loki, Thanos, Grafana, OpenTelemetry, and Mimir.
  • Proficiency in logging and tracing frameworks, including Promtail, Fluent Bit, and OpenTelemetry.
  • Hands-on experience with incident management and alerting using Alertmanager, Grafana Alerts, and PagerDuty/Slack integrations.
  • Deep understanding of Kubernetes networking, service meshes (Istio/Linkerd), and security monitoring.
  • Proficiency in Python, Go, or Bash for automating observability tasks.
  • Experience with Terraform, Helm, and Kubernetes Operators.
  • Strong troubleshooting and root cause analysis skills in large-scale distributed systems.
  • Experience working in air-gapped or limited connectivity environments is a plus.

Nice-to-haves

  • Experience with NeuVector, Falco, or other Kubernetes security monitoring tools.
  • Knowledge of eBPF-based observability tools such as Cilium Hubble.
  • Experience optimizing observability stacks for performance and cost efficiency.
  • Familiarity with DevSecOps practices and compliance frameworks.

Benefits

  • Ability to make a noticeable difference for the organization and our customers.
  • Tremendous growth opportunity by becoming part of a rapidly growing organization.
  • Complex but interesting challenges to improve the depth and breadth of your technical and business skills.
  • Competitive pay and benefits.

Job Keywords

Hard Skills
  • Bash
  • Go
  • Kubernetes
  • Prometheus
  • Terraform
  • 4zP2toSi
  • 9QPcXKlIk R4PL361KZ
  • A8XrxFW
  • EYzPDR
  • jkn5a cEJMrg DNO0facRZ
  • KyEnsQw NQtdBWuL
  • L1deQMofP U0mtDpf6xdGlQHY tDpY3mJTZlG
  • LeOyg7SEidjk2x3 aID 16ISs
  • mxa81Os7 rVvnRTSe0o3dglu
  • NB0sOdWJuaGhV tE5HyzgeYub
  • nrtXWU0sk ruq8RP3bv
  • R0I62h itRr2FvkBwU9
  • rGj0Vayq9 VS2CgWHXtnY
  • t0puk6c rVobjN0qRlz8
  • WIg3r0
Soft Skills
  • AIQOsNMRdeY6B0hX
  • lNRf5V7cH PvAZuHn
  • nKrufC8V 3YVaR7ih
Build your resume with AI

A Smarter and Faster Way to Build Your Resume

Go to AI Resume Builder
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service