Toyota Motors - Plano, TX
posted 2 months ago
Toyota Financial Services (TFS) is launching a new Site Reliability Engineering (SRE) team, and we are seeking a Senior Manager to spearhead this initiative. As the Senior Manager, you will be responsible for building the SRE team from the ground up and establishing robust processes to ensure the reliability, performance, and scalability of our systems and applications. This role is pivotal in creating a best-in-class customer experience in an innovative, collaborative environment. You will support engineers with hands-on coding, debugging, and implementation of automation to foster a more stable and robust application environment. Your leadership will be crucial in defining and implementing strategies for system reliability, performance, and scalability, as well as developing Service Level Objectives (SLOs) and Service Level Agreements (SLAs) aligned with business goals. In this position, you will design and deploy monitoring, alerting, and incident management systems, implement and refine disaster recovery and business continuity plans, and lead major incident responses while coordinating with stakeholders for resolution. Conducting post-incident reviews and driving continuous improvement will be part of your responsibilities, as will identifying and implementing automation opportunities to streamline operations. You will oversee the development and implementation of monitoring and incident management tools, work with engineering, product, and infrastructure teams on reliability goals, and participate in architectural reviews, providing input on reliability and scalability. Additionally, you will recruit, build, and lead the new SRE team with clear objectives and metrics, ensuring a collaborative team culture and supporting professional development.