Brightspeed - Charlotte, NC
posted 4 months ago
At Brightspeed, we are currently seeking a Principal Site Reliability Engineer to join our growing team. In this pivotal role, you will be responsible for implementing and maintaining monitoring systems that track the performance and availability of our business-critical systems and infrastructure. Your expertise will be crucial in using metrics to identify trends and potential issues, ensuring that our services are reliable and scalable. You will collaborate closely with development teams, operations, and other stakeholders to guarantee that new services and features meet the highest standards of reliability and performance. As a Principal Site Reliability Engineer, your duties will include responding to system outages and performance issues, performing root cause analysis to prevent recurrence, and developing scripts and tools to automate repetitive tasks such as deployment, scaling, and monitoring. You will work on reducing latency and improving the speed of data transmission across our network, while also defining and measuring Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure that our services meet required performance and availability targets. Additionally, you will conduct postmortems after incidents to identify areas for improvement and work with lead application owners and internal change management to review code changes and support deployments. In this leadership role, you will lead a team of site reliability engineers, both onshore and offshore, mentoring them in the support activities required for system reliability. Your ability to communicate effectively with multiple target audiences, including senior business and IT leadership, technology teams, and business teams, will be essential for success in this position.