Engle Martin & Associates - Atlanta, GA

posted 8 days ago

Full-time - Mid Level
Atlanta, GA
Insurance Carriers and Related Activities

About the position

The DevOps / Site Reliability Engineer (SRE) is responsible for deploying and managing infrastructure using modern DevOps practices. This role requires expertise in Kubernetes, Terraform, and observability tools like DataDog, ensuring the reliability, scalability, and performance of systems. The SRE collaborates with development and operations teams to implement CI/CD pipelines, monitor system performance, and troubleshoot issues, while also participating in an on-call rotation to maintain system uptime.

Responsibilities

  • Designs, deploys, and maintains cloud infrastructure using Kubernetes and Terraform.
  • Collaborates with development teams to implement CI/CD pipelines and automate deployment processes.
  • Monitors system performance, troubleshoots issues, and implements solutions to optimize performance and ensure uptime.
  • Develops and maintains monitoring and alerting systems using observability tools such as DataDog.
  • Implements and manages microservices architectures, ensuring seamless communication and scalability.
  • Troubleshoots and resolves issues related to infrastructure, deployments, and performance, ensuring high availability and reliability of systems.
  • Stays updated on emerging technologies and industry trends and incorporates them into infrastructure and practices where applicable.
  • Participates in on-call rotation to address issues and incidents during weekdays, ensuring system reliability and availability.
  • Collaborates closely with all other members of the team to take shared responsibility for overall efforts committed to for each sprint.
  • Establishes and maintains positive working relationships with other members of the organization across departments, divisions, and locations.
  • Maintains the confidentiality of proprietary and sensitive information.

Requirements

  • Bachelor's degree in computer science, engineering, or a related field, or equivalent work experience.
  • At least 3-5 years of experience in a DevOps role required, with experience as a Site Reliability Engineer preferred.
  • Prior experience with cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP).
  • Prior experience with observability and monitoring platforms such as DataDog, Dynatrace, or Splunk.
  • Certification in relevant cloud technologies preferred (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator).
  • Prior experience with Azure AKS preferred.
  • Experience with other DevOps tools and technologies such as Azure DevOps, Jenkins, GitLab CI/CD preferred.

Nice-to-haves

  • Strong proficiency in Kubernetes and Terraform for managing and deploying infrastructure.
  • Solid understanding of microservices architecture and experience in deploying and managing microservices-based systems.
  • Proficiency in scripting languages such as Python, Shell, or Bash for automation tasks.
  • Familiarity with Agile methodologies and practices.
  • Knowledge of security best practices for cloud environments.
  • Excellent problem-solving skills and ability to troubleshoot complex issues in distributed systems.
  • Strong communication and collaboration skills, with the ability to work effectively across teams in a fast-paced, agile environment.

Benefits

  • Professional development opportunities
  • Flexible working hours
  • On-call rotation compensation
  • Health insurance coverage
  • Paid time off for personal and family needs
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service