Senior Site Reliability Engineer ATL

The Matlen Silver Group - Atlanta, GA

posted 11 days ago

Full-time - Senior

Onsite - Atlanta, GA

Professional, Scientific, and Technical Services

About the position

The Senior Site Reliability Engineer will focus on enhancing the reliability of applications and infrastructure through modern reliability practices. This role involves driving cross-team initiatives aimed at improving application performance, uptime, and resiliency. The ideal candidate will have a strong background in observability, automation, and incident management, with a proven ability to implement effective reliability engineering strategies.

Responsibilities

Set SLOs / SLIs / error budgets and manage reliability for infrastructure and applications.
Utilize scripting languages such as JavaScript, Nodejs, Python, Maven, Ansible, and Bash.
Handle diverse systems with configuration management tools like Puppet, Chef, and Ansible.
Eliminate toil through automation.
Manage incidents using tools like PagerDuty.
Implement monitoring and alerting systems such as Prometheus, Grafana, and Dynatrace.
Understand networking protocols and components including HTTP, DNS, TCP/IP, and Load Balancing strategies.
Work with Serverless Application Framework and containerized workloads using Docker or Kubernetes.
Familiarize with distributed systems and Microservices architecture.
Automate infrastructure using tools like CloudFormation and Terraform.
Understand CI/CD processes and deployment automation tools like Code Pipeline and Jenkins.
Debug, troubleshoot, and solve problems effectively.
Communicate and collaborate with various business units and third parties.
Liaise with developers, operations staff, and third-party resources.
Integrate APIs and mentor team members on reliability engineering.

Requirements

Minimum 5+ years of experience in DevOps practices.
Hands-on experience with AWS Cloud and DevOps principles.
2+ years of experience working with DevOps tools (GitLab CI, AWS-CodePipeline).
2+ years of experience in scripting tools (Bash, Python, etc.).
1+ years of experience in developing NodeJS or TypeScript applications.
2+ years of experience in building and supporting applications in AWS using native services.
1+ year of experience in AWS CDK.
Ability to troubleshoot and resolve problems with existing AWS Cloud Controls.

Nice-to-haves

1+ year of experience in containerization technologies like Kubernetes, OpenShift, Docker.
1+ year of experience in application resiliency evaluation using AWS FIS.
1+ year of experience using Litmus for Chaos Engineering methods.
Exposure to RedHat OpenShift on AWS (ROSA).

Senior Site Reliability Engineer ATL

About the position

Responsibilities

Requirements

Nice-to-haves

Tools

Career Hubs

Guides

Company