Lead Software Engineer - Site Reliability Engineering

$112,151 - $262,854/Yr

Cable

posted 3 months ago

Full-time - Senior

Support Activities for Mining

About the position

As the Lead Software Engineer - Site Reliability Engineering (SRE) at Comcast, you will play a pivotal role in ensuring the reliability and performance of the FreeWheel platforms. This position requires a deep understanding of large-scale distributed systems, where you will be responsible for various aspects including availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. You will engage in designing, analyzing, and troubleshooting these systems, while also debugging and optimizing code and automating routine tasks. You will be part of a dynamic team that combines software engineering and technology infrastructure expertise. Your responsibilities will include leading technical solutions to enhance the reliability and efficiency of FreeWheel platforms, supporting high-profile live events, and collaborating closely with developers and tech leads throughout the software release cycle. You will also be responsible for authoring infrastructure as code, dedicating approximately 30% of your time to developing tools in Python or Golang, and advocating for best practices in engineering and technical operations. In this role, you will lead on-call shifts, incident prevention, and response efforts, while also providing training and coaching to junior team members. Your ability to exercise independent judgment and discretion will be crucial as you navigate complex technical challenges and drive improvements in production quality and operational efficiency. This position requires a commitment to working Eastern Standard hours, including weekends during on-call rotations, and a proactive approach to problem-solving and collaboration across teams.

Responsibilities

Be responsible for reliability and technical operations of FreeWheel TV Platform Ad-Serving component(s).
Lead technical solutions in measuring and improving reliability, quality and efficiency of FreeWheel platforms.
Lead in a variety of complex analytical duties in the planning, deployment, testing and evaluation of FreeWheel products.
Possesses in-depth working knowledge of FreeWheel platforms, infrastructure, internal processes, and teams/partners.
Support FreeWheel powered live events such as Super Bowl, Olympic Games, March Madness, and FIFA World Cup.
Plug into software release cycle, work closely with developers and tech leads to ensure software releases are well designed, planned, implemented, released, and monitored.
Lead in design and implementation in authoring infrastructure as code with best practices, tool use, and quality assurance.
Responsible for dedicating ~30% of the time in tools development, written in Python or Golang.
Lead technical solutions for infrastructure and application management, monitoring, and operations with standardization and automation focus.
Leverages engineering methodologies and technical knowledge in specific areas of focus.
Lead code level debugging on issues escalated to the team.
Lead on-call shifts, incident prevention, response, and retrospect.
Advocate for engineering and technical operations procedures, policies, processes and SRE best practices.
Partner with developers and vendors to identify and drive improvements including production quality, operational efficiency, engineering productivity.
Provide support and influence for the Cybersecurity program needs such as patching, vulnerability cleanup, secure server configuration, testing and validation, technical controls implementation and cybersecurity incident remediation efforts.
Provides training and coaching to peers and more junior SRE team members.

Requirements

Bachelor's degree in computer science, a related engineering field, or equivalent practical experience.
Prior 7 years of experience in software engineering with one of programming languages: Python, Golang, JavaScript.
Prior 5 years of technical operation experience for business-critical application(s) over public cloud (AWS specific is a big plus) services: VPC, subnets, network access control lists, security groups, EC2 instances, S3 buckets, IAM, Route 53, Lambda.
Prior 5 years of experience with SDLC tools: Containers, Kubernetes, Docker, Salt / Ansible / Chef / Puppet, Jenkins, Git.
Prior experience of Linux administration, network security, and system infrastructure.
Excellent communication and collaboration, within/across team(s) and continents.

Nice-to-haves

Prior experience in supporting business-critical services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Technical leadership and influence demonstrated in focused product/tech areas and practices.
Prior experience in providing technical solutions at an internet company.

Benefits

Paid Time off
Physical Wellbeing
Financial Wellbeing
Emotional Wellbeing
Life Events + Family Support

Lead Software Engineer - Site Reliability Engineering

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company