Rays Techsolutions Inc. - Mountain View, CA

posted 4 days ago

Full-time - Mid Level
Mountain View, CA

About the position

The Site Reliability Engineer (SRE) role focuses on designing, implementing, and maintaining complex data systems that support millions of customers. The position emphasizes the application of Cloud Native principles and best practices to ensure the availability, security, performance, and scalability of database systems. The SRE will also engage in operational tasks, including on-call support, automation, and continuous improvement of system performance.

Responsibilities

  • Design, implement, and maintain complex data systems supporting millions of customers with Cloud Native principles and best practices.
  • Build and maintain CI/CD pipelines in Jenkins.
  • Build and deploy services in Kubernetes cluster using helm and customization.
  • Contribute to infrastructure changes to AWS with a deep understanding of AWS services.
  • Engage in on-call for pre-production and production systems supporting multi-million users.
  • Write/Review RCA docs to prevent recurrence of incidents and share learnings.
  • Contribute to major system upgrades, deployment automation, monitoring enhancements, and production changes.
  • Create operational playbooks, contribute to how-to articles, and gain domain knowledge to drive changes in the team.
  • Participate and contribute in FMEA/Chaos testing, security remediations, etc.
  • Share best practices and patterns for operational excellence and cost optimization.
  • Reduce or eliminate manual steps by automating as much as possible.
  • Continuously look for opportunities to increase developer velocity and productivity.

Requirements

  • Bachelor's or master's degree in computer science or a related technical field, or equivalent experience.
  • 4+ years of hands-on development and operational experience with building and maintaining infrastructure in AWS.
  • Extensive performance monitoring, troubleshooting, and tuning experience.
  • Experience with AWS services and hands-on knowledge of hosting on Cloud.
  • Experience with scripting languages for DevOps automation.
  • Experience with any one of the programming languages: Java, Python, or Ruby.
  • Knowledge of Docker and Kubernetes, ArgoCD.
  • Experience with monitoring and observability using Splunk, Wavefront, AppDynamics, Prometheus, Tracing, etc.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service