The Matlen Silver Group - Atlanta, GA

posted 11 days ago

Full-time - Mid Level
Atlanta, GA
Professional, Scientific, and Technical Services

About the position

The Senior Site Reliability Engineer will focus on enhancing Delta's reliability engineering practices by driving cross-team initiatives aimed at improving application resiliency, uptime, and performance. The role requires a strong background in modern reliability disciplines and experience in implementing observability plans around logs, metrics, and traces.

Responsibilities

  • Set SLOs / SLIs / error budgets and manage reliability for infrastructure and applications.
  • Utilize scripting languages such as JavaScript, Nodejs, Python, Maven, Ansible, and Bash.
  • Handle diverse systems with configuration management systems like Puppet, Chef, and Ansible.
  • Eliminate toil by leveraging automation.
  • Manage incidents using tools like PagerDuty.
  • Implement monitoring and alerting systems such as Prometheus, Grafana, and Dynatrace.
  • Understand standard networking protocols and components including HTTP, DNS, TCP/IP, and Load Balancing strategies.
  • Work with Serverless Application Framework and containerized workloads using Docker or Kubernetes.
  • Familiarity with distributed systems and Microservices.
  • Utilize infrastructure automation tools like CloudFormation and Terraform.
  • Understand CI/CD processes and deployment automation tools like Code Pipeline, Code Deploy, Jenkins, and Bamboo.
  • Debug, troubleshoot, and solve problems effectively.
  • Communicate and collaborate with various business units and third parties.
  • Liaise with developers, operations staff, and third-party resources.
  • Integrate APIs and coach/mentor team members on reliability engineering.

Requirements

  • Minimum 5+ years of experience in DevOps practices.
  • Hands-on experience with AWS Cloud and DevOps principles.
  • 2+ years of experience working on DevOps tools (GitLab CI, AWS-CodePipeline).
  • 2+ years of experience in scripting tools (Bash, Python, etc.).
  • 1+ years of experience in developing NodeJS or TypeScript applications.
  • 2+ years of experience in building and supporting applications in AWS using their native services.
  • 1+ year of experience in AWS CDK.
  • Ability to troubleshoot and resolve problems with existing AWS Cloud Controls.

Nice-to-haves

  • 1+ year of experience in containerization technologies like Kubernetes, OpenShift, Docker.
  • 1+ year of experience in application resiliency evaluation using AWS FIS.
  • 1+ year of experience using Litmus for Chaos Engineering methods.
  • Exposure to RedHat OpenShift on AWS (ROSA).
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service