Raas Infotek - Atlanta, GA

posted 3 months ago

Full-time
Atlanta, GA
Professional, Scientific, and Technical Services

About the position

The Site Reliability Engineer (SRE) position is a critical role focused on ensuring the reliability, availability, and performance of our systems. As an SRE, you will be responsible for maintaining highly available and scalable systems, leveraging your extensive experience in cloud technologies and infrastructure management. You will work closely with development teams to implement best practices in automation, monitoring, and incident response, ensuring that our services meet the highest standards of reliability and performance. In this role, you will utilize your strong proficiency in scripting and programming languages such as Python, Bash, Ruby, or Go to develop automation tools that enhance operational efficiency. Your expertise in cloud platforms like AWS, Azure, or Google Cloud Platform will be essential as you manage infrastructure using infrastructure-as-code tools like Terraform or Cloud Formation. You will also be expected to have a solid understanding of containerization and orchestration technologies, particularly Docker and Kubernetes, to facilitate the deployment and management of applications in a cloud environment. As part of your responsibilities, you will implement and maintain monitoring and logging solutions using tools such as Prometheus, the ELK stack, or Splunk, and utilize distributed tracing frameworks like Jaeger or Zipkin to troubleshoot complex issues in production environments. Your strong problem-solving skills will be crucial in identifying and resolving incidents quickly, minimizing downtime and ensuring a seamless user experience. Collaboration is key in this role, as you will work effectively in cross-functional teams, communicating clearly with both technical and non-technical stakeholders. Your ability to mentor and lead others will also be valuable, as you help foster a culture of reliability and operational excellence within the organization.

Responsibilities

  • Maintain highly available and scalable systems.
  • Develop automation tools using scripting or programming languages.
  • Manage infrastructure using cloud technologies and infrastructure-as-code tools.
  • Implement and maintain containerization and orchestration technologies.
  • Administer Linux/Unix systems and manage networking concepts.
  • Utilize configuration management tools and version control systems.
  • Monitor and log system performance using various tools and frameworks.
  • Troubleshoot complex issues in production environments.
  • Collaborate effectively with cross-functional teams.

Requirements

  • 8+ years experience as a Site Reliability Engineer or similar role.
  • Strong proficiency in scripting or programming languages (e.g., Python, Bash, Ruby, Go).
  • Extensive experience with cloud technologies (e.g., AWS, Azure, Google Cloud Platform).
  • Experience with infrastructure-as-code tools (e.g., Terraform, Cloud Formation).
  • Solid understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Proficiency in Linux/Unix system administration and networking concepts.
  • Experience with configuration management tools (e.g., Ansible, Chef, Puppet).
  • Deep knowledge of monitoring and logging tools (e.g., Prometheus, ELK stack, Splunk).
  • Strong problem-solving skills and ability to troubleshoot complex issues.
  • Excellent communication and collaboration skills.

Nice-to-haves

  • Relevant certifications in cloud platforms (e.g., AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer).
  • Experience with infrastructure resilience testing and chaos engineering.
  • Knowledge of security best practices and experience implementing security controls.
  • Familiarity with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI/CD).
  • Experience with configuration management using GitOps principles.
  • Previous experience in a leadership or mentorship role.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service