Tiktok - San Jose, CA
posted 3 days ago
As a Senior Site Reliability Engineer (SRE) for TikTok's Global E-Commerce team, you will play a crucial role in ensuring the reliability and performance of our mission-critical e-commerce platform. This position is part of a global SRE on-call rotation, where you will be responsible for Tier-1 online incident response and DevOps support. Your primary focus will be on maintaining service levels for our revenue-generating e-commerce platform and its supporting infrastructure, emphasizing service reliability, highly-scalable design, and effective release management in a cloud-native environment. In this role, you will define service level indicators and data-driven objectives, developing DevOps and SRE standards, processes, and methodologies to enhance uptime, latency, and overall system health. Collaboration is key; you will work closely with engineering and product teams to ensure that essential stability and maintainability requirements, such as capacity planning and launch reviews, are met, facilitating transparent service delivery to our customers. You will also design strategies for risk detection and mitigation, disaster recovery, release management, cost optimization, and engineering quality. Automation will be a significant part of your responsibilities, focusing on infrastructure-as-code, scalability, and service resiliency. Additionally, you will implement best practices around incident management and post-mortems while participating in on-call rotations, ensuring that we learn from incidents and continuously improve our systems and processes.