InfoVision - Atlanta, GA

posted 19 days ago

Full-time
Atlanta, GA
Professional, Scientific, and Technical Services

About the position

The Systems SRE (Site Reliability Engineer) role focuses on ensuring the reliability and performance of systems within the Google Cloud Platform environment. This position involves automating deployment processes, monitoring applications and infrastructure, and collaborating with stakeholders to improve the software development lifecycle. The role is critical in maintaining system scalability and reducing repetitive tasks, while also being responsible for on-call duties to address incidents as they arise.

Responsibilities

  • Monitoring application and infrastructure performance.
  • Automating the deployment process and reducing toil through automation.
  • Improving the software development lifecycle by conducting post-incident reviews and documenting solutions.
  • Developing and maintaining systems and services, ensuring scalability and preventive measures are in place.
  • Collaborating with key stakeholders to align on system improvements.
  • Reducing repetitive work for the team through effective automation.
  • Handling on-call duties to diagnose, mitigate, fix, or escalate incidents as necessary.

Requirements

  • 10+ years of overall IT experience.
  • Strong knowledge of Google Cloud Platform (GCP) or any cloud platform, with a preference for GCP.
  • Proficient in Terraform for infrastructure as code.
  • Understanding of microservice architecture, infrastructure, and networking.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service