Tiktok - San Jose, CA
posted 3 days ago
As a Site Reliability Engineer (SRE) for TikTok's Global E-Commerce team, you will play a crucial role in ensuring the reliability and performance of our mission-critical e-commerce platform. This position is part of a global on-call rotation, where you will be responsible for Tier-1 online incident response and DevOps support. Your primary focus will be on maintaining service levels for our revenue-generating e-commerce platform, which includes overseeing all supporting infrastructure and services. You will work in a cloud-native environment, emphasizing service reliability, highly-scalable design, and effective release management. In this role, you will define service level indicators and data-driven objectives, developing DevOps and SRE standards, processes, and methodologies to enhance uptime, latency, and overall system health. Collaboration is key; you will work closely with engineering and product teams to ensure that essential stability and maintainability requirements, such as capacity planning and launch reviews, are met. This collaboration will enable transparent service delivery to our customers. You will also design strategies for risk detection and mitigation, disaster recovery simulations, release management, cost optimization, and engineering quality. Automation will be a significant part of your responsibilities, focusing on infrastructure-as-code, scalability, and service resiliency. Implementing best practices around incident management and conducting post-mortems will be essential, as you will be part of the on-call rotations that ensure our services remain operational and efficient.