Comcast - Reston, VA
posted 3 months ago
As the Site Reliability Engineer (SRE) at FreeWheel, a Comcast company, you will play a crucial role in ensuring the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning for the FreeWheel platforms. This position requires you to engage in designing, analyzing, and troubleshooting large-scale distributed systems, as well as debugging and optimizing code while automating routine tasks. You will be part of a diverse team that combines software and technology infrastructure expertise, providing subject matter expertise and resolving complex break/fix scenarios. Collaboration with engineering, vendors, and client services will be essential to deliver successful technical solutions. You will work with limited supervision, executing your responsibilities while following operational practices and independently determining approaches for non-routine solutions. Your core responsibilities will include ensuring the reliability and technical operation of the FreeWheel TV Platform UI and API components. You will implement technical solutions aimed at measuring and improving the reliability, quality, and efficiency of FreeWheel platforms. This role involves performing complex analytical duties in the planning, deployment, testing, and evaluation of FreeWheel products, requiring an in-depth working knowledge of the platforms, infrastructure, internal processes, and teams/partners. You will support high-profile live events powered by FreeWheel, such as the Super Bowl, Olympic Games, March Madness, and FIFA World Cup. Additionally, you will plug into the software release cycle, working closely with developers to ensure that software releases are well designed, planned, implemented, released, and monitored. You will also participate in the design and implementation of infrastructure as code, focusing on best practices, tool use, and quality assurance. Your role will involve engineering technical solutions for infrastructure and application management, monitoring, and operations with a focus on standardization and automation. You will leverage engineering methodologies and technical knowledge in specific areas of focus, perform code-level debugging on escalated issues, and support incident prevention, response, and retrospectives. As an advocate for engineering and technical operations procedures, policies, processes, and SRE best practices, you will work closely with developers and vendors to identify and drive improvements in production quality, operational efficiency, and engineering productivity. Furthermore, you will provide support for the Cybersecurity program, including patching, vulnerability cleanup, secure server configuration, testing and validation, technical controls implementation, and incident remediation efforts. Training and coaching peers and junior SRE team members will also be part of your responsibilities, requiring consistent exercise of independent judgment and discretion in significant matters.