Site Reliability Engineer - Usds

$129,960 - $246,240/Yr

Tiktok - Seattle, WA

posted 3 months ago

Full-time - Mid Level

Seattle, WA

Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services

About the position

TikTok is the leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. As part of the U.S. Data Security (USDS) team, the Site Reliability Engineer (SRE) will play a crucial role in ensuring the reliability and performance of our services. This position combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. The SRE will engage in and improve the entire lifecycle of services, from inception and design through development, capacity planning, launch reviews, deployment, operation, and refinement. The role requires designing and implementing software platforms and monitoring frameworks for efficient, automated, and intelligent service-oriented architecture (SOA) governance. The SRE will also be responsible for scaling systems sustainably through automation and evolving system reliability, efficiency, and velocity by advocating for necessary changes. Additionally, the position involves practicing sustainable user support, incident response, and conducting blameless postmortems to learn from incidents and improve future performance. At TikTok, we embrace a culture of diversity, intellectual curiosity, openness, and problem-solving. We encourage close collaboration while promoting self-direction. The ideal candidate will have a strong background in programming and systems engineering, with experience in managing complex challenges of scale. This role is essential in maintaining the integrity and performance of TikTok's services, ensuring that millions of users can continue to express themselves creatively and be entertained safely.

Responsibilities

Engage in and improve the whole lifecycle of services from inception and design, throughout development, capacity planning, and launch reviews, to deployment, operation, and refinement.
Design and implement software platforms and monitor frameworks for efficient, automated, and intelligent service-oriented architecture (SOA) governance.
Scale systems sustainably through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes.
Practice sustainable user support, incident response, and blameless postmortems.

Requirements

Bachelor's degree in Computer Science or a related technical field with 3-5+ years of experience.
Experience programming in one of the following languages: C, C++, Java, Python, Go, or Rust.
Familiarity with Unix/Linux system internals, networking, and distributed systems.

Nice-to-haves

Experience in MySQL, Redis, Nginx, Kubernetes, Docker, OpenStack, Hadoop, Spark, Flink, etc.
Experience in designing and analyzing large-scale distributed systems.
Strong skills in problem-solving and communication.

Benefits

100% premium coverage for employee medical insurance, approximately 75% premium coverage for dependents.
Health Savings Account (HSA) with a company match.
Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life and AD&D insurance plans.
Flexible Spending Account (FSA) Options like Health Care, Limited Purpose and Dependent Care.
10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO) and 10 paid sick days per year.
12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.
Mental and emotional health benefits through EAP and Lyra.
401K company match, gym and cellphone service reimbursements.

Site Reliability Engineer - Usds

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company