The Matlen Silver Group - Atlanta, GA

posted 11 days ago

Full-time - Senior
Onsite - Atlanta, GA
Professional, Scientific, and Technical Services

About the position

The Senior Site Reliability Engineer will focus on enhancing the reliability of applications and infrastructure through modern reliability practices. This role involves driving cross-team initiatives aimed at improving application performance, uptime, and resiliency. The ideal candidate will have a strong background in observability, automation, and incident management, with a proven ability to implement effective reliability engineering strategies.

Responsibilities

  • Set SLOs / SLIs / error budgets and manage reliability for infrastructure and applications.
  • Utilize scripting languages such as JavaScript, Nodejs, Python, Maven, Ansible, and Bash.
  • Handle diverse systems with configuration management tools like Puppet, Chef, and Ansible.
  • Eliminate toil through automation.
  • Manage incidents using tools like PagerDuty.
  • Implement monitoring and alerting systems such as Prometheus, Grafana, and Dynatrace.
  • Understand networking protocols and components including HTTP, DNS, TCP/IP, and Load Balancing strategies.
  • Work with Serverless Application Framework and containerized workloads using Docker or Kubernetes.
  • Familiarize with distributed systems and Microservices architecture.
  • Automate infrastructure using tools like CloudFormation and Terraform.
  • Understand CI/CD processes and deployment automation tools like Code Pipeline and Jenkins.
  • Debug, troubleshoot, and solve problems effectively.
  • Communicate and collaborate with various business units and third parties.
  • Liaise with developers, operations staff, and third-party resources.
  • Integrate APIs and mentor team members on reliability engineering.

Requirements

  • Minimum 5+ years of experience in DevOps practices.
  • Hands-on experience with AWS Cloud and DevOps principles.
  • 2+ years of experience working with DevOps tools (GitLab CI, AWS-CodePipeline).
  • 2+ years of experience in scripting tools (Bash, Python, etc.).
  • 1+ years of experience in developing NodeJS or TypeScript applications.
  • 2+ years of experience in building and supporting applications in AWS using native services.
  • 1+ year of experience in AWS CDK.
  • Ability to troubleshoot and resolve problems with existing AWS Cloud Controls.

Nice-to-haves

  • 1+ year of experience in containerization technologies like Kubernetes, OpenShift, Docker.
  • 1+ year of experience in application resiliency evaluation using AWS FIS.
  • 1+ year of experience using Litmus for Chaos Engineering methods.
  • Exposure to RedHat OpenShift on AWS (ROSA).
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service