Tiktok - San Jose, CA

posted 3 days ago

Full-time - Mid Level
San Jose, CA
Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services

About the position

TikTok is the leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. The Data Systems Infrastructure (DSI) team at TikTok plays a crucial role in supporting this mission by constructing and managing the infrastructure that powers our global operations. As a Production Systems Engineer, you will be at the forefront of this effort, ensuring the reliability, efficiency, and scalability of our data center and Cloud operations on a worldwide scale. Your work will involve engaging in the entire lifecycle of infrastructure systems, from design consulting to deployment and refinement, contributing to the overall enhancement of our services. In this role, you will be responsible for delivering tools and solutions that improve automation, monitoring, and disaster recovery processes. You will troubleshoot complex technical issues in high-pressure environments, conduct root-cause analyses for service interruptions, and establish preventive measures to ensure the smooth operation of our systems. Collaboration is key, as you will partner with various stakeholders, including infrastructure architects, project managers, and internal customers, to understand business objectives and design innovative solutions for our Core IDCs and CDN/Edge and Cloud Services. Your responsibilities will also include creating and maintaining technical documentation, participating in on-call rotations, and contributing to incident response efforts. This position offers an exciting opportunity to be part of a dynamic team that is shaping the future of technology at TikTok, where every challenge is viewed as an opportunity for growth and innovation.

Responsibilities

  • Contribute to enhancing the quality, reliability, efficiency, effectiveness, and scalability of data center and Cloud operations.
  • Engage in and improve the whole lifecycle of Infrastructure systems from design consulting to launch reviews, deployment, operation, and refinement.
  • Deliver tools and solutions to improve automation, reliability, scalability, and operability of services.
  • Deliver tools and solutions to improve monitoring availability, latency, and overall service, server and Cloud infrastructure and network health.
  • Troubleshoot and resolve complex technical issues in a high-pressure, time-sensitive environment.
  • Conduct high-level root-cause analysis for service interruptions and establish preventive measures.
  • Practice sustainable incident response and postmortem.
  • Partner with stakeholders to understand overarching business objectives and design innovative solutions for Core IDCs and CDN/Edge and Cloud Services.
  • Create and maintain standard operating procedures and knowledge bases.
  • Participate in on-call rotations and incident response teams to solve critical problems in production.

Requirements

  • Bachelor's degree in Computer Science, Computer Engineering, or a relevant technical field, or equivalent practical experience.
  • Minimum 3 years of experience in systems infrastructure operations or related fields, working with data center or CDN production systems and system design/validation.
  • Intermediate level understanding of server hardware, including hands-on experience with hardware design, evaluation, validation, and diagnostics.
  • Intermediate-level proficiency in Linux operating systems.
  • Fluency in Bash, Python, and Golang.

Nice-to-haves

  • Intermediate level expertise in data center operations, including OS installations and break-fix.
  • Intermediate-level skills in monitoring server health, network switches, and data center conditions.
  • Experience with at least one automation project.
  • Junior-level understanding of networking concepts.
  • Experience managing and coordinating teams in a global environment.
  • Experience in project management, including preparing project plans and managing multiple projects simultaneously.
  • Familiarity with Agile methodologies (e.g., Kanban, Scrum).
  • Preferred skills include Golang, REST APIs, Gin, Ansible, Load Balancer, SQL, Hive, Hadoop, Clickhouse, Message Queue, Redis.

Benefits

  • 100% premium coverage for employee medical insurance, approximately 75% for dependents.
  • Health Savings Account (HSA) with company match.
  • Dental and Vision insurance.
  • Short/Long term Disability, Basic Life, Voluntary Life, and AD&D insurance plans.
  • Flexible Spending Account (FSA) options for healthcare and dependent care.
  • 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO).
  • 10 paid sick days per year.
  • 12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.
  • Mental and emotional health benefits through EAP and Lyra.
  • 401K company match.
  • Gym and cellphone service reimbursements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service