Site Reliability Engineer, Data Platform-TikTok-US-Tech Services

$158,080 - $321,632/Yr

Tiktok - Seattle, WA

posted 3 days ago

Full-time - Mid Level

Seattle, WA

Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services

About the position

TikTok is the leading destination for short-form mobile video, with a mission to inspire creativity and bring joy to over 1 billion users globally. The company is seeking a Site Reliability Engineer (SRE) for its Data Platform Team, which is responsible for addressing challenges in data infrastructure and data products. This team manages various components, including the Query Engine, Logging and Data Ingestion Infrastructure, Experimentation Platform, and Workflow Management Platform. The primary goal is to support ad-hoc and interactive queries, batch pipelines, logging, and ingesting large volumes of real-time data, as well as facilitating A/B testing for product feature launches. As a Site Reliability Engineer in the data platform area, you will play a crucial role in managing one of the largest data platforms in the world. Your responsibilities will include ensuring that data, services, and infrastructures are reliable, fault-tolerant, efficiently scalable, and cost-effective. You will engage in the entire lifecycle of service, from inception and design to deployment, operation, and refinement. Additionally, you will maintain live services by measuring and monitoring their availability, latency, and overall system health, while practicing sustainable incident response and conducting blameless postmortems. Establishing best engineering practices for both technical and non-technical personnel will also be a key part of your role. The position offers the opportunity to design, build, and deliver various systems as a software engineer, contributing to the development of reliable, scalable, robust, and extensible big data systems that support TikTok's core products and business objectives. This role is ideal for individuals who are passionate about leveraging their technical skills to enhance the reliability and performance of large-scale data systems.

Responsibilities

Engage in and improve the whole lifecycle of service, from inception and design, through to deployment, operation and refinement.
Ensure reliable, fault-tolerant, efficiently scalable and cost-effective data, services and infrastructures.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Practice sustainable incident response and blameless postmortems.
Establish best engineering practice for engineers as well as non-technical people.
Design and implement reliable, scalable, robust and extensible big data systems that support core products and business.

Requirements

BS or MS degree in Computer Science or related technical field or equivalent practical experience.
Experience in Big Data technologies (Hadoop, M/R, Hive, Spark, Metastore, Presto, Flume, Kafka, ClickHouse, Flink, etc.).
Experience with performing data analysis, data ingestion and data integration.
Solid communication and collaboration skills.

Benefits

100% premium coverage for employee medical insurance, approximately 75% premium coverage for dependents.
Health Savings Account (HSA) with a company match.
Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life and AD&D insurance plans.
Flexible Spending Account (FSA) Options like Health Care, Limited Purpose and Dependent Care.
10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO) and 10 paid sick days per year.
12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.
Mental and emotional health benefits through EAP and Lyra.
401K company match, gym and cellphone service reimbursements.

Site Reliability Engineer, Data Platform-TikTok-US-Tech Services

About the position

Responsibilities

Requirements

Benefits

Tools

Career Hubs

Guides

Company