Tiktok - Seattle, WA
posted 3 months ago
TikTok is the leading destination for short-form mobile video, with a mission to inspire creativity and bring joy to over 1 billion users globally. The Data Platform Team at TikTok is focused on addressing challenges in data infrastructure and data products. This team is responsible for various critical components, including the Query Engine, Logging and Data Ingestion Infrastructure, Experimentation Platform, and Workflow Management Platform. The primary goal of the team is to support ad-hoc and interactive queries, manage batch pipelines, log and ingest large volumes of real-time data, and facilitate A/B testing for all product feature launches. As a Site Reliability Engineer (SRE) within the Data Platform area, you will have the unique opportunity to manage services and infrastructures that are part of one of the largest data platforms in the world. Your role will involve ensuring that the data, services, and infrastructures are reliable, fault-tolerant, efficiently scalable, and cost-effective. You will also engage in the entire lifecycle of service management, from inception and design through deployment, operation, and refinement. This position allows you to design, build, and deliver various systems as a software engineer, contributing to the overall success of TikTok's data initiatives. Your responsibilities will include maintaining services once they are live by measuring and monitoring availability, latency, and overall system health. You will practice sustainable incident response and conduct blameless postmortems to improve system reliability. Additionally, you will establish best engineering practices for both technical and non-technical team members, ensuring that the systems you design and implement are reliable, scalable, robust, and extensible, supporting TikTok's core products and business objectives.