Disney Experiences - Orlando, FL

posted 4 days ago

Full-time - Manager
Orlando, FL

About the position

The Manager of Site Reliability Engineering at Disney Experiences is a leadership role responsible for overseeing a team of site reliability engineers. This position focuses on defining, measuring, and improving service levels for various applications while mentoring team members and aligning reliability plans with the business strategy of DX Tech and Digital.

Responsibilities

  • Oversee finances and budgets in MyPPM, ensuring accurate billing processes and contributing to forecasting and accrual processes.
  • Work with the vendor management team to maintain the optimal mix of cast members, contractors, and managed services.
  • Manage the work of the team in Jira and maintain documentation in Confluence.
  • Lead the evolution of DevOps practices within the broader team framework, enhancing observability practices.
  • Manage the SRE team to deliver monitoring and observability for development and business users.
  • Work with development teams to develop and manage mutually agreeable service levels for critical business applications.
  • Drive teams to consult, design, build, and support development pipelines, automate infrastructure and operations, and build telemetry for monitoring.
  • Lead the team in developing technology engineering skills using AWS and Google Cloud Platform for various workloads.
  • Develop and advocate strategic directions for reliability, observability, and recovery, focusing on operational excellence and application stability.
  • Engage in estimation and planning across the organization, providing technical recommendations and feedback.
  • Proactively track and assess new technologies to inform strategic decision-making.

Requirements

  • Minimum 8 years of related work experience.
  • Demonstrated leadership in implementing observability principles across complex systems and environments.
  • Extensive experience with modern software delivery tools such as GitHub, GitLab, Harness.io, LaunchDarkly, AWS Code Deploy, and Azure DevOps.
  • Proficiency in designing and managing highly scalable and resilient infrastructure using tools like Terraform, Cloud Formation, Ansible, and Chef.
  • Outstanding communication and leadership abilities to ensure effective team growth and development.
  • A visionary who motivates teams and fosters creativity.

Nice-to-haves

  • Leveraging AI for predictive insights to drive measurable continuous improvement in system reliability.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service