Tiktok - Mountain View, CA

posted 3 days ago

Full-time - Mid Level
Mountain View, CA
Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services

About the position

TikTok is the leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. U.S. Data Security (USDS) is a subsidiary of TikTok in the U.S., created to enhance focus and governance on our data protection policies and content assurance protocols to ensure the safety of U.S. users. The teams within USDS are dedicated to providing oversight and protection of the TikTok platform and U.S. user data, allowing millions of Americans to continue using TikTok for learning, earning, self-expression, and entertainment. The Global E-commerce Site Reliability Engineer (SRE) team works closely with engineering and product teams to build and maintain large-scale, globally distributed, observable, and fault-tolerant systems. As an SRE, you will take ownership of production systems and be responsible for observability and automation across complex service mesh architectures. In this role, you will own the service level of a critical, revenue-generating E-commerce platform, focusing on service reliability, scalable design, and release management in a cloud-native environment. You will define service level indicators and data-driven objectives to improve uptime, latency, and system health of a core TikTok production platform. Collaboration with engineering and product teams is essential to ensure that key requirements such as capacity planning and launch reviews are performed to enable transparent service delivery to customers. Automation will be a key focus, aimed at infrastructure-as-code, scalability, and service resiliency. You will also implement SRE practices around incident management and post-mortems while participating in on-call rotations.

Responsibilities

  • Own the service level of a critical, revenue generating E-commerce platform and all supporting infrastructure and services.
  • Focus on service reliability, highly-scalable design, and release management in a cloud-native environment.
  • Define service level indicators and data-driven objectives to uphold and improve uptime, latency, and system health of a core TikTok production platform.
  • Collaborate cross-team with engineering and product to ensure that key requirements (such as capacity planning and launch reviews) are performed to enable transparent service delivery to customers.
  • Implement automation geared towards infrastructure-as-code, scalability, and service resiliency.
  • Implement SRE practices around incident management and post-mortems while being part of on-call rotations.

Requirements

  • Good understanding of Unix/Linux operating systems internals and networking.
  • Experience writing code in Java, Go, Python, or a similar language.
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems (Redis, Elasticsearch, Kafka, Druid, Hadoop, Flink or comparable solutions), relational databases, caching solutions, and web service frameworks.
  • Experience with algorithms, data structures, complexity analysis, and software design.
  • Experience developing tools and APIs to reduce manual interaction with systems and applications using a variety of coding and scripting standards.
  • Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.

Nice-to-haves

  • Familiarity with running production grade web services at scale and understanding cloud native technologies and networking.
  • Knowledge about a variety of strategies for ingesting, modeling, processing, and persisting data, ETL design, dimensional modeling, and cube design.

Benefits

  • 100% premium coverage for employee medical insurance, approximately 75% premium coverage for dependents, and a Health Savings Account (HSA) with a company match.
  • Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life, and AD&D insurance plans.
  • Flexible Spending Account (FSA) options like Health Care, Limited Purpose, and Dependent Care.
  • 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO) (prorated upon hire and increased by tenure) and 10 paid sick days per year.
  • 12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.
  • Mental and emotional health benefits through EAP and Lyra.
  • 401K company match, gym, and cellphone service reimbursements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service