Toyota Motors - Plano, TX

posted 2 months ago

Full-time - Senior
Hybrid - Plano, TX
Transportation Equipment Manufacturing

About the position

Toyota Financial Services (TFS) is launching a new Site Reliability Engineering (SRE) team, and we are seeking a Senior Manager to spearhead this initiative. As the Senior Manager, you will be responsible for building the SRE team from the ground up and establishing robust processes to ensure the reliability, performance, and scalability of our systems and applications. This role is pivotal in creating a best-in-class customer experience in an innovative, collaborative environment. You will support engineers with hands-on coding, debugging, and implementation of automation to foster a more stable and robust application environment. Your leadership will be crucial in defining and implementing strategies for system reliability, performance, and scalability, as well as developing Service Level Objectives (SLOs) and Service Level Agreements (SLAs) aligned with business goals. In this position, you will design and deploy monitoring, alerting, and incident management systems, implement and refine disaster recovery and business continuity plans, and lead major incident responses while coordinating with stakeholders for resolution. Conducting post-incident reviews and driving continuous improvement will be part of your responsibilities, as will identifying and implementing automation opportunities to streamline operations. You will oversee the development and implementation of monitoring and incident management tools, work with engineering, product, and infrastructure teams on reliability goals, and participate in architectural reviews, providing input on reliability and scalability. Additionally, you will recruit, build, and lead the new SRE team with clear objectives and metrics, ensuring a collaborative team culture and supporting professional development.

Responsibilities

  • Support Engineers with hands-on coding, debugging, and implementation of automation to support a more stable and robust application environment.
  • Foster a collaborative team culture and support professional development.
  • Define and implement strategies for system reliability, performance, and scalability.
  • Develop Service Level Objectives (SLOs) and Service Level Agreements (SLAs) aligned with business goals.
  • Design and deploy monitoring, alerting, and incident management systems.
  • Implement and refine disaster recovery and business continuity plans.
  • Lead major incident responses and coordinate with stakeholders for resolution.
  • Conduct post-incident reviews and drive continuous improvement.
  • Identify and implement automation opportunities to streamline operations.
  • Oversee the development and implementation of monitoring and incident management tools.
  • Work with engineering, product, and infrastructure teams on reliability goals.
  • Participate in architectural reviews, providing input on reliability and scalability.
  • Recruit, build, and lead the new SRE team with clear objectives and metrics.

Requirements

  • 7+ years of experience in Site Reliability Engineering, DevOps, or a related field, with at least 3 years in a leadership role.
  • Demonstrated experience in building and managing teams, with a proven track record of achieving high system reliability and performance.
  • Deep understanding of cloud platforms (e.g., AWS, GCP, Azure) and container orchestration technologies (e.g., Kubernetes).
  • Proficiency in scripting and automation (e.g., Python, Bash) and familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
  • Strong leadership capabilities, with excellent problem-solving and decision-making skills.
  • Effective communication skills, with the ability to convey complex technical concepts to diverse audiences.

Benefits

  • A work environment built on teamwork, flexibility, and respect.
  • Professional growth and development programs to help advance your career, as well as tuition reimbursement.
  • Team Member Vehicle Purchase Discount.
  • Toyota Team Member Lease Vehicle Program (if applicable).
  • Comprehensive health care and wellness plans for your entire family.
  • Flextime and virtual work options (if applicable).
  • Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota regardless of whether you contribute.
  • Paid holidays and paid time off.
  • Referral services related to prenatal services, adoption, childcare, schools and more.
  • Tax Advantaged Accounts (Health Savings Account, Health Care FSA, Dependent Care FSA)
  • Relocation assistance (if applicable)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service