Tiktok - Mountain View, CA
posted 4 days ago
As a Site Reliability Engineer (SRE) in the Data Platform team at TikTok, you will play a crucial role in managing and enhancing one of the largest data platforms in the world that directly supports the TikTok app. Your primary responsibility will be to ensure that the data, services, and infrastructures are reliable, fault-tolerant, efficiently scalable, and cost-effective. This position is part of the U.S. Data Security (USDS) division, which focuses on data protection policies and content assurance protocols to keep U.S. users safe. In this role, you will engage in and improve the entire lifecycle of services, from inception and design through deployment, operation, and refinement. You will maintain services once they are live by measuring and monitoring availability, latency, and overall system health. This includes practicing sustainable incident response and conducting blameless postmortems to learn from incidents and improve future performance. You will also be responsible for establishing best engineering practices for both technical and non-technical team members, ensuring that everyone is aligned with the goals of reliability and efficiency. Additionally, you will design and implement reliable, scalable, robust, and extensible big data systems that support core products and business objectives. This role requires a strong foundation in software and systems engineering, as well as a commitment to collaboration and cross-functional partnerships. TikTok promotes a hybrid work schedule, requiring employees to work in the office three days a week, which may be adjusted based on departmental needs.