Ally Financial - Charlotte, NC
posted 5 months ago
As a Site Reliability Engineer (SRE) Manager at Ally Financial, you will play a pivotal role in ensuring the reliability and scalability of our complex systems. You will be responsible for managing the SRE team, which includes both Ally employees and contractors, and will collaborate with cross-functional teams to design, build, and maintain robust, scalable, and fault-tolerant systems. Your work will involve advocating for reliability best practices during the application development lifecycle, ensuring that our systems are not only functional but also resilient and efficient. In this role, you will design and implement monitoring and alerting systems that provide real-time visibility into user experience and system health. You will monitor and analyze system performance, proactively identifying potential issues and implementing solutions to ensure optimal performance and reliability. Additionally, you will develop and maintain automated tools and processes to streamline operational tasks, reducing the need for manual interventions. Your participation in incident response and post-mortems will contribute to our continuous improvement efforts, ensuring that we learn from past incidents and enhance our systems accordingly. You will also conduct capacity planning and resource optimization to handle the growing demands on our infrastructure. This involves continuously researching and evaluating new technologies and practices to enhance the reliability and efficiency of our systems. Your leadership will be crucial in guiding the SRE team through these challenges, fostering a culture of problem-solving and innovation.