Tiktok - Los Angeles, CA
posted 3 months ago
The Intelligent Creation Site Reliability Engineering (SRE) Team at TikTok is on a mission to enhance the content creation platform through the application of visual intelligence and artificial intelligence. As a Site Reliability Engineer, you will play a crucial role in ensuring the reliability and performance of our services, which are essential for empowering content creators and users alike. This position is ideal for individuals who are passionate about software reliability and enjoy tackling complex challenges in a dynamic environment. You will work closely with product teams to implement the latest AI Generative Content, Intelligent Editing, and Content Understanding technologies, making a tangible impact on TikTok users around the world. In this role, you will be responsible for deploying and maintaining the content creation platform, which includes training, inference, and pipeline orchestration in a production environment. You will continuously integrate and deploy services to the cloud, ensuring optimal performance and reliability. Your expertise will be vital in developing and maintaining software, identifying performance bottlenecks, and debugging issues. Additionally, you will engage in service capacity planning, demand forecasting, and system tuning to enhance the overall efficiency of our services. The SRE team is dedicated to monitoring the health and performance of over 100 microservices that power TikTok's content creation platform. You will intervene as needed to rectify outages or issues, ensuring that our platform remains robust and reliable. This position requires a collaborative mindset, as you will work closely with cross-functional teams to foster effective partnerships and enhance our service-oriented architecture governance. TikTok promotes a hybrid work schedule, requiring employees to work in the office three days a week, with flexibility based on departmental needs.