3i People - Atlanta, GA
posted 4 months ago
We have a position for a Sr. Site Reliability Engineer with one of our clients in Atlanta, GA for an initial contract duration of 5 months. This role is crucial in leading and mentoring a team of Site Reliability Engineers (SREs), fostering a culture of collaboration, continuous learning, and operational excellence. The selected candidate will drive the adoption of SRE best practices and ensure adherence to reliability and performance standards across the organization. The Sr. Site Reliability Engineer will be responsible for designing and implementing highly available, scalable, and fault-tolerant systems using AWS. This includes collaborating with software engineering teams and other SREs to influence design and architecture decisions that improve system reliability and performance. The role also involves developing and maintaining automation scripts and tools to streamline operations, deployments, and monitoring processes. Utilizing Infrastructure as Code (IaC) tools such as Terraform, GitHub Actions, and CloudFormation will be essential for managing infrastructure effectively. The engineer will implement and maintain robust monitoring, alerting, and logging systems using tools like Splunk, Grafana, or New Relic. Additionally, leading incident response efforts, conducting root cause analysis, and implementing measures to prevent recurrence will be key responsibilities. The engineer will oversee the design and maintenance of CI/CD pipelines using tools like Jenkins, GitLab CI, or CircleCI, ensuring seamless and efficient code deployment processes that reduce time to market and increase system reliability. Performance tuning and capacity planning will also be part of the role to ensure systems can handle growing workloads, along with troubleshooting experience to identify and resolve performance bottlenecks in infrastructure and applications.