site reliability engineer

$135,200 - $137,280/Yr

Randstad - Roanoke, TX

posted 22 days ago

Full-time - Mid Level
Roanoke, TX
Administrative and Support Services

About the position

The Site Reliability Engineer (SRE) role focuses on ensuring the reliability and performance of highly distributed multi-tiered systems. The position requires a strong background in cloud environments, container orchestration, and observability tools, along with the ability to manage incidents and automate processes. The SRE will provide enterprise Cloud and Platform Engineering support for production environments and participate in on-call rotations to resolve issues effectively.

Responsibilities

  • Deploy and support highly distributed multi-tiered systems at scale.
  • Manage and interpret large datasets using query languages and visualization tools.
  • Provide enterprise Cloud and Platform Engineering support for production environments.
  • Participate in on-call rotation to provide solutions for incidents.
  • Implement advanced observability practices and techniques at scale.
  • Automate day-to-day activities using Ansible and Python.
  • Handle a fleet of on-prem servers, including security and patching oversight.
  • Manage hundreds of SSL certificates for all applications in scope.
  • Perform chaos testing to ensure system resilience under pressure.
  • Collaborate with various teams to build and maintain effective relationships.

Requirements

  • Bachelor's degree in a technology-related field (e.g., Engineering, Computer Science).
  • 5-8+ years of hands-on experience deploying and/or supporting distributed systems.
  • Hands-on experience with AWS and Azure cloud environments.
  • Experience with container orchestration, preferably Kubernetes.
  • Proficient in scripting languages such as Korn/Bash/Javascript.
  • Solid understanding of Cloud Computing and DevOps concepts, including CI/CD pipelines.
  • Experience with observability tools like Datadog, Prometheus, Grafana, etc.
  • Ability to triage and perform root cause analysis under pressure.
  • Proficient communication skills for technical and non-technical audiences.

Nice-to-haves

  • Certifications in AWS or Azure cloud environments.
  • Experience with batch processing tools like Control M or Informatica.
  • Familiarity with ITIL processes such as Incident and Change Management.
  • Experience with API testing tools like SoapUI and Postman.

Benefits

  • Health insurance coverage
  • 401K contribution
  • Incentive and recognition program
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service