Staff Software Engineer, Site / Service Reliability - Live Operations

$183,300 - $256,600/Yr

Riot Games - Los Angeles, CA

posted 3 months ago

Full-time - Senior

Los Angeles, CA

Miscellaneous Manufacturing

About the position

Software Reliability Engineering at Riot is tasked with addressing the most complex technology challenges that arise as the company expands into a multi-game ecosystem. As a Staff Engineer on this team, you will collaborate with various engineering teams across Riot, engaging with a diverse range of technical stacks. This role demands a deep understanding of Riot's architecture, allowing you to prioritize and deploy your team effectively to ensure players enjoy consistent and reliable engagement with Riot's games. You will be responsible for building alignment among multiple technology stakeholders and fostering the growth of your engineers. Your role will involve coordinating with technical leads across the organization while aligning your priorities with Riot's strategic objectives. If you thrive on tackling high-scale service development challenges and enjoy seeing plans come to fruition, this position is designed for you. You will be expected to maintain and evolve Riot's technical understanding of its multifaceted architectures, ensuring that central technology teams have the necessary insights into the performance of live services. Additionally, you will help shape and lead your team into a competent Tier 1 Site Reliability group, design and implement services to enhance reliability and visibility, and establish long-lasting standards across various technical stacks. Your responsibilities will also include providing critical support and maintenance for existing platforms, being on rotational on-call for live product support, conducting meaningful code reviews, producing comprehensive user documentation, and mentoring a junior engineering team to become subject matter experts in observability, triage, and incident response.

Responsibilities

Maintain and evolve Riot's technical understanding of its multifaceted technical architectures
Ensure Riot central technology teams have the necessary vision into how our live services are performing
Help craft and lead the team into a competent Tier 1 Site Reliability capable group
Design, implement and modify services to enhance reliability and visibility
Establish meaningful, long lived, standards across multiple technical stacks
Provide emergent, critical support and maintenance to existing platforms
Be on rotational on-call for live product support and operational assessment
Provide meaningful code review for other members of the team
Produce comprehensive user documentation around your implemented solutions
Mentor, guide and level up a junior engineering team to be subject matter experts in observability, triage and incident response

Requirements

Bachelor's or Master's degree in Computer Science or a related field or relevant professional experience
5+ years of relevant experience
Experience with designing, prioritizing and maintaining high-capacity, high-availability, and high-performant software, especially back-end services
Demonstrated ability to work across multiple organizations and generate alignment on technical standards
Demonstrated experience mentoring engineers to grow technically on your teams
Demonstrated experience working in container-based ecosystems and with a container scheduler (e.g. Marathon, Mesos, Kubernetes, GKE, Amazon ECS)
Experience with distributed systems, specifically microservices
Experience with API design, preferably using REST
Understand networking - HTTP down to the network layer (TCP/IP, routing, etc)
Understand relational databases like MySQL

Nice-to-haves

2+ Years working in a high performance Site Reliability capacity
Experience building high-quality software in languages like Go, Java, Python, or Javascript
Familiarity with Site Reliability best practices
Experience building teams from the ground up
Experience with CI/CD pipelines, ideally Jenkins and/or Github Actions
Understand software performance and the influence of latency in online games
Experience with AWS (or comparable cloud environments)

Benefits

Open paid time off policy
Flexible work schedules
Medical insurance
Dental insurance
Life insurance
Parental leave for you, your spouse/domestic partner and children
401k with company match
Short and long-term disability insurance
Vision insurance

Staff Software Engineer, Site / Service Reliability - Live Operations

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company