Tiktok - Mountain View, CA

posted 3 days ago

Full-time - Mid Level
Mountain View, CA
Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services

About the position

As a Site Reliability Engineer (SRE) within TikTok's U.S. Data Security (USDS) division, you will play a crucial role in ensuring the reliability and performance of the TikTok platform. This position is designed for individuals who are passionate about maintaining high service levels and enhancing the user experience through robust system management. You will gain a comprehensive understanding of the various components and services that power TikTok, allowing you to effectively monitor and maintain these systems to meet established service-level agreements (SLAs) and service-level objectives (SLOs). Your responsibilities will include measuring and monitoring system availability, performance, and overall health, ensuring that services are reliable, fault-tolerant, and efficiently scalable. In this role, you will collaborate with a global team to address site-up issues, providing user support and incident responses while conducting postmortems to learn from any incidents. You will also be tasked with scaling systems sustainably through automation and advocating for changes that enhance system reliability, efficiency, and velocity. This position requires a proactive approach to problem-solving and a commitment to continuous improvement, as you will be expected to evolve the systems you manage to better serve TikTok's user base. TikTok's mission is to inspire creativity and bring joy, and as part of the USDS team, you will contribute to this mission by ensuring that the platform remains a safe and enjoyable space for millions of users. The work environment is hybrid, requiring employees to be in the office three days a week, fostering collaboration and cross-functional partnerships. This role is subject to strict national security-related screening due to the sensitive nature of the data and information you will be working with.

Responsibilities

  • Gain a solid understanding of the various components and services that power the TikTok experience.
  • Maintain services to meet service-level-agreements (SLAs) and service-level-objectives (SLOs) by measuring and monitoring availability, performance, and overall system health.
  • Participate as part of a global team to support site-up issues to ensure that services are reliable, fault-tolerant, efficiently scalable and cost-effective.
  • Scale systems sustainably through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes.
  • Provide user support, incident responses and postmortems.

Requirements

  • Bachelor or above degree in Computer Science or a related technical discipline with 2+ years experience in the deployment and administration of large-scale distributed systems.
  • Strong understanding of Unix/Linux operating systems internals and administration, networking (e.g. TCP/IP, routing, network topologies and hardware), storage systems, and database systems.
  • Experience in one or more programming languages, such as C, C++, Java, Python, Go, Ruby, Rust, JavaScript.
  • Experience in debugging and optimizing code and automate routine tasks.
  • Experience in development, testing, deployment and administration of one or more of the following types of systems: Nginx, Kubernetes, Docker, OpenStack, Hadoop, Spark, Flink, Kafka.
  • Experience in designing and analyzing large-scale distributed systems is preferred.
  • Strong skills in problem solving and communication.

Nice-to-haves

  • Experience in cloud services and infrastructure management.
  • Familiarity with monitoring tools and practices for large-scale systems.
  • Knowledge of security best practices and data protection protocols.

Benefits

  • 100% premium coverage for employee medical insurance, approximately 75% premium coverage for dependents.
  • Health Savings Account (HSA) with a company match.
  • Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life and AD&D insurance plans.
  • Flexible Spending Account (FSA) Options like Health Care, Limited Purpose and Dependent Care.
  • 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO) and 10 paid sick days per year.
  • 12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.
  • Mental and emotional health benefits through EAP and Lyra.
  • 401K company match, gym and cellphone service reimbursements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service