Tiktok - Seattle, WA
posted 3 days ago
TikTok is the leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. U.S. Data Security (USDS) is a subsidiary of TikTok in the U.S., created to enhance focus and governance on data protection policies and content assurance protocols to keep U.S. users safe. The teams within USDS are dedicated to providing oversight and protection of the TikTok platform and U.S. user data, ensuring that millions of Americans can continue to use TikTok for learning, earning, expressing creativity, or entertainment. The Site Reliability Engineering (SRE) team within the AML (Applied Machine Learning) division combines system engineering with machine learning to develop and operate a large-scale AI/ML recommendation system for users in the United States and globally. As a Site Reliability Engineer, you will have the opportunity to sharpen your skills in coding, performance analysis, and large-scale systems operation. You will play a crucial role in shaping the future of AML systems and making a tangible impact on TikTok users. The SRE team is committed to collaboration and cross-functional partnerships, and currently follows a hybrid work schedule requiring employees to work in the office three days a week, with flexibility as directed by management. This model is regularly reviewed, and specific requirements may change over time. In this role, you will be responsible for designing, building, and maintaining highly available, scalable, and fault-tolerant systems. You will monitor and analyze system performance, identifying and resolving issues proactively to prevent user impact. Additionally, you will develop and maintain automated monitoring, alerting, and incident response systems, collaborating closely with software engineering teams to ensure applications are designed with reliability, scalability, and performance in mind. You will also implement and maintain security best practices, ensuring compliance with regulatory requirements, and participate in on-call rotations to respond to issues and incidents as they arise.