Site Reliability Engineer

$122,720 - $124,800/Yr

Randstad - Merrimack, NH

posted 5 months ago

Full-time - Mid Level

Merrimack, NH

Administrative and Support Services

About the position

The Site Reliability Engineer (SRE) position is a critical role that combines software engineering and systems engineering to build and maintain large-scale, distributed, and fault-tolerant systems. This role is based in Merrimack, NH, or Westlake, TX, and is offered as a contract position with a competitive hourly rate ranging from $59 to $60. The SRE will be responsible for managing Kubernetes clusters, troubleshooting issues, and ensuring the reliability and performance of applications in a cloud environment. The ideal candidate will have a strong background in systems and platform operations, with a focus on automation and continuous integration/continuous deployment (CI/CD) practices. In this role, the engineer will work closely with development teams to implement and manage infrastructure as code, utilizing tools such as Terraform and CloudFormation. The SRE will also be expected to have hands-on experience with log aggregation and monitoring tools, ensuring that systems are monitored effectively and that alerts are set up for any potential issues. A strong understanding of application networking and AWS cloud security is essential, as the engineer will be responsible for implementing AWS products and services to enhance system reliability. The position requires a passion for DevOps culture and a commitment to continuous learning and improvement. The SRE will be part of a globally distributed team, so strong communication skills are necessary to collaborate effectively across different time zones and cultures. The work hours are from 9 AM to 5 PM, and the candidate should be comfortable working in an agile environment, adapting to changing requirements and priorities.

Responsibilities

Manage Kubernetes cluster administration and troubleshoot Kubernetes issues.
Develop and maintain Python applications and APIs.
Implement and manage CI/CD pipelines using tools like Ansible and Jenkins.
Utilize log aggregation and monitoring tools for data visualization and alerting.
Work with infrastructure as code tools such as Terraform and CloudFormation.
Ensure AWS cloud security and account management best practices are followed.
Collaborate with globally distributed teams to enhance system reliability.
Participate in an agile environment, adapting to changing requirements.

Requirements

3+ years of experience in systems and platform operations and technology management.
Experience with EKS and Kubernetes cluster administration.
Strong proficiency in Linux and shell scripting.
Proficient in Python programming and API development.
Understanding of application networking principles.
Hands-on experience with monitoring tools like Datadog, Splunk, ELK, Prometheus, and Grafana.
Experience with AWS products and services implementation.
Familiarity with container technologies, particularly Docker.
Experience with CI/CD pipeline implementation using Ansible, Jenkins, and ArgoCD.
Knowledge of AWS cloud security and account management practices.
Strong communication skills, both written and oral.

Nice-to-haves

Passion for learning new technologies and practices.
Experience working in a DevOps culture.
Comfortable in an agile work environment.

Site Reliability Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Tools

Career Hubs

Guides

Company