Tiktok - New York, NY
posted 3 days ago
TikTok is the leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. U.S. Data Security (USDS) is a subsidiary of TikTok in the U.S., created to enhance focus and governance on data protection policies and content assurance protocols to ensure the safety of U.S. users. The USDS team is dedicated to providing oversight and protection of the TikTok platform and U.S. user data, allowing millions of Americans to continue using TikTok for learning, earning, self-expression, and entertainment. The teams within USDS include Trust & Safety, Security & Privacy, Engineering, User & Product Ops, and Corporate Functions, all working together to fulfill this commitment. As a Site Reliability Engineer (SRE) within the AML (Applied Machine Learning) team, you will combine system engineering with machine learning to develop and operate a massively distributed AI/ML recommendation system for users in the United States and globally. This role offers the opportunity to enhance your skills in coding, performance analysis, and large-scale systems operation, while also allowing you to shape the future of AML systems and make a significant impact on TikTok users. In this position, you will be responsible for designing, building, and maintaining highly available, scalable, and fault-tolerant systems. You will monitor and analyze system performance, proactively identifying and resolving issues before they affect users. Additionally, you will develop and maintain automated monitoring, alerting, and incident response systems, collaborating closely with software engineering teams to ensure applications are designed with reliability, scalability, and performance in mind. Security best practices will be a priority, and you will participate in on-call rotations to respond to incidents, conducting root cause analyses and implementing preventative measures to minimize future risks.