Site Reliability Engineering (SRE) Leader - Remote or Houston

Halliburton - Houston, TX

posted 2 months ago

Part-time,Full-time - Senior

Remote - Houston, TX

Support Activities for Mining

About the position

As a Site Reliability Engineering (SRE) Leader at Halliburton, you will be at the forefront of maintaining and enhancing the reliability and performance of our iEnergy platform. This role is pivotal in ensuring that our cloud infrastructure operates seamlessly, providing the necessary uptime and scalability for our global operations. You will lead a dedicated team responsible for overseeing the deployment and operational management of our services, interfacing with key stakeholders across Engineering, Sales, Operations, and our customer base. Your leadership will be crucial in fostering a culture of innovation and excellence within the team, driving initiatives that reduce operational toil and enhance automation in our workflows. In this position, you will leverage your extensive experience in managing large-scale cloud operations, particularly within AWS and Azure environments. You will be responsible for leading globally distributed teams that operate 24/7, ensuring that our services are resilient and responsive to the needs of our users. Your role will involve not only overseeing incident management and resolution but also implementing best practices in automation and observability. You will work closely with both Software and Infrastructure Engineering teams to ensure that our design and operational processes are robust and efficient, contributing to the overall success of our cloud services. The ideal candidate will possess a strong technical background in SRE, with a proven track record of leading teams in enterprise cloud and infrastructure operations. You will be expected to communicate effectively, both verbally and in writing, to convey complex technical concepts to diverse audiences. Your passion for technology and leadership will inspire your team to achieve their best, driving continuous improvement in our operational practices and service delivery.

Responsibilities

Lead a team responsible for running iEnergy to maintain uptime and scale deployments.
Interface with key stakeholders including Engineering, Sales, Operations, and Customers.
Oversee and improve incident management processes, driving rapid resolution of incidents.
Implement best practices in automation and observability tooling for application performance and configuration management.
Design and operate infrastructure, networking, and storage solutions for AWS, Azure, and private cloud environments.
Reduce operational toil and increase automation in running workloads.
Foster a culture of transparency, trust, and technical strength within the team.
Collaborate with Software and Infrastructure Engineering organizations in design and operations.

Requirements

8+ years of SRE experience in enterprise cloud and infrastructure operations.
Deep hands-on technical expertise in SRE functions, especially in partnerships with Engineering, Customer, and Product Support functions.
Strong technical and team leadership capabilities with global teams.
Bachelor's or Master's degree in Computer Science or Engineering.
AWS or Azure Specialty certification (e.g., SysOps Administrator) is ideal.

Nice-to-haves

Experience with multi-tenant services operations.
Familiarity with best of breed automation and observability tooling.
Experience in structured engineering and operations processes throughout the lifecycle of deployment.

Benefits

Competitive salary based on experience and qualifications.
Opportunities for career growth and development within a global company.
Work in a diverse and inclusive environment.

Site Reliability Engineering (SRE) Leader - Remote or Houston

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company