site reliability engineer

$135,200 - $137,280/Yr

Randstad - Roanoke, TX

posted 22 days ago

Full-time - Mid Level

Roanoke, TX

Administrative and Support Services

About the position

The Site Reliability Engineer (SRE) role focuses on ensuring the reliability and performance of highly distributed multi-tiered systems. The position requires a strong background in cloud environments, container orchestration, and observability tools, along with the ability to manage incidents and automate processes. The SRE will provide enterprise Cloud and Platform Engineering support for production environments and participate in on-call rotations to resolve issues effectively.

Responsibilities

Deploy and support highly distributed multi-tiered systems at scale.
Manage and interpret large datasets using query languages and visualization tools.
Provide enterprise Cloud and Platform Engineering support for production environments.
Participate in on-call rotation to provide solutions for incidents.
Implement advanced observability practices and techniques at scale.
Automate day-to-day activities using Ansible and Python.
Handle a fleet of on-prem servers, including security and patching oversight.
Manage hundreds of SSL certificates for all applications in scope.
Perform chaos testing to ensure system resilience under pressure.
Collaborate with various teams to build and maintain effective relationships.

Requirements

Bachelor's degree in a technology-related field (e.g., Engineering, Computer Science).
5-8+ years of hands-on experience deploying and/or supporting distributed systems.
Hands-on experience with AWS and Azure cloud environments.
Experience with container orchestration, preferably Kubernetes.
Proficient in scripting languages such as Korn/Bash/Javascript.
Solid understanding of Cloud Computing and DevOps concepts, including CI/CD pipelines.
Experience with observability tools like Datadog, Prometheus, Grafana, etc.
Ability to triage and perform root cause analysis under pressure.
Proficient communication skills for technical and non-technical audiences.

Nice-to-haves

Certifications in AWS or Azure cloud environments.
Experience with batch processing tools like Control M or Informatica.
Familiarity with ITIL processes such as Incident and Change Management.
Experience with API testing tools like SoapUI and Postman.

Benefits

Health insurance coverage
401K contribution
Incentive and recognition program

site reliability engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company