Tiktok - San Jose, CA
posted 27 days ago
TikTok is the leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. The MLOps - Global SRE team plays a crucial role in ensuring the stability and efficiency of machine learning systems under the Global Monetization Products and Technology organization. This position focuses on the operational aspects of machine learning models, encompassing data preparation, development, training, deployment, and serving. As a Senior Machine Learning Ops Engineer, you will be responsible for setting Service Level Objectives (SLOs) for online machine learning serving systems and maintaining their stability. You will also oversee the stability of offline machine learning training tasks, working to improve their success rates. Additionally, you will roll out GPU model training in non-China regions and ensure the stability of AIGC-related machine learning tasks. Resource management and planning for machine learning resources, including cost and budget considerations, will also be part of your responsibilities. At TikTok, we believe that every challenge is an opportunity to learn, innovate, and grow as a team. We are committed to creating an inclusive environment where employees are valued for their unique skills and perspectives. Our platform connects people globally, and we strive to reflect the diverse communities we serve. Join us in our mission to inspire creativity and bring joy, and be part of a team that drives impact for ourselves, our company, and the communities we serve.