Senior Director Network Reliability Engineering

Request Technology - Plano, TX

posted 2 months ago

Full-time - Senior

Plano, TX

Administrative and Support Services

About the position

The Senior Director of Network Reliability Engineering at our prestigious enterprise company is a pivotal role focused on enhancing the operations and availability of network services. This position is responsible for overseeing the engineering functions across all network services technology domains, with a strong emphasis on improving availability, productivity, and efficiency through automation and instrumentation. The ideal candidate will lead the organization in adopting an API-driven approach to network services, facilitating seamless integration with various network tools and services, and ensuring easy access for other teams to utilize these resources effectively. In this role, you will conduct automated regular audits of network infrastructure to ensure compliance with industry standards and best practices. You will spearhead the development and integration of self-service tools that empower other teams to troubleshoot and resolve network-related issues independently. Collaboration with cross-functional teams will be essential to design and implement tools that automate end-to-end processes within the network infrastructure, thereby identifying opportunities to enhance internal processes through automation. The Senior Director will also be responsible for building and leading a sustainability and reliability engineering function that focuses on infrastructure availability and performance. This includes developing automated test suites, maintaining clear documentation of solutions, and implementing comprehensive network service monitoring to ensure optimal uptime and performance. You will define and measure key Service Level Objectives (SLOs) related to availability, performance, incidents, and chronic problems, while also establishing a capacity planning framework to prevent downtime due to capacity issues. Your leadership will extend to owning the end-to-end availability and performance of critical services, with a goal of automating responses to non-exceptional service conditions. Partnering with application and business teams will be crucial to ensure high-quality products are developed and released into production. You will work closely with architecture, customers, and product teams to specify and document solutions and practices, fostering a DevOps culture that emphasizes continuous operations and support. Encouraging teamwork and collaboration across teams will be a key focus, as you lead with an emphasis on productivity, efficiency, respect, and cultural sensitivity.

Responsibilities

Lead the organization into building a network services API driven approach to enable seamless integration of network tools with various other network related services.
Perform automated regular network infrastructure audits to ensure continuous compliance with best practices and industry standards.
Lead the development and/or integration of self-service tools for other teams to troubleshoot and resolve network-related issues.
Collaborate with other teams to design and implement tools that will help automate end-to-end processes within network infrastructure.
Identify opportunities to automate repetitive tasks and help enhance quality of internal processes.
Develop automated test suites and maintain clear documentation of solutions developed.
Build and lead the sustainability and reliability network engineering function that owns infrastructure availability and performance.
Build tools to lead through automation and proactive/predictive alerts by having a strong data analytical tool set to identify areas of improvement.
Implement comprehensive network service monitoring to ensure uptime and performance, including synthetic, real user, system, application performance, dashboards etc.
Define, measure, and meet key Service Level Objectives including availability, performance, incidents and chronic problems.
Stand up a capacity planning that defines a framework to regularly measure performance and capacity and ensuring that there is no downtime due to capacity.
Own end-to-end availability and performance of critical services and build automation to prevent problem recurrence; eventually automate response to all non-exceptional service conditions.
Partner with application and business team members to ensure high quality product is developed and released into production.
Work closely with Architecture, Customers and Product to specify and document solutions and practices.
Build a DevOps culture to provide high quality, continuous operations, and ongoing support ensuring critical service level metrics, customer requirements and financial objectives.
Encourage and build teams that work together without silos.
Lead teams with a focus on productivity, efficiency, respect, and cultural sensitivity.

Requirements

15+ years of directly related professional experience.
College or advanced studies degree and/or a minimum of 12+ years of relevant IT and management experience.
Proven professional experience with operational and organizational management, leadership of teams, and enterprise-wide technology strategy.
Possess good interpersonal and collaboration skills with ability to communicate optimally with small and large groups of business partners and senior leadership.
Strong facilitation skills and possess good organization, communication, collaboration, and writing skills.
Detail driven with respect to documentation and communication.

Benefits

Bonus eligibility

Senior Director Network Reliability Engineering

About the position

Responsibilities

Requirements

Benefits

Tools

Career Hubs

Guides

Company