Tiktok - San Jose, CA
posted 4 days ago
TikTok is the leading destination for short-form mobile video, with a mission to inspire creativity and bring joy. Our Compute Platform SRE team is a newly established group responsible for supporting all Big Data services and products across the company. We ensure the reliability of TikTok's major data warehouse products, services, and query engines, including ClickHouse, Spark, Presto, and Doris. As a Tech Lead Manager, you will lead a global SRE team distributed across the US and Singapore, focusing on maintaining high service reliability and performance optimization. You will be responsible for upholding Service Level Agreements (SLAs), managing incident responses, and developing robust incident management mechanisms. Your role will also involve continuous performance optimization, infrastructure automation, and collaboration with product and development teams to integrate reliability into the software lifecycle. In this position, you will assess and forecast infrastructure needs based on growth patterns and upcoming initiatives, while staying updated with industry trends and best practices. You will lead efforts to troubleshoot and resolve service incidents, coordinate with cross-functional teams, and implement proactive measures to prevent service disruptions. Your leadership will be crucial in shaping the future of the Compute Platform SRE team, driving impact for TikTok and the communities we serve. We are looking for someone who is passionate about computer science and Internet technology, with a strong sense of ownership and the ability to collaborate effectively across time zones.