Tiktok - San Jose, CA

posted 3 months ago

Full-time - Mid Level
San Jose, CA
Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services

About the position

At TikTok, we are on a mission to inspire creativity and bring joy to our users. As a leading destination for short-form mobile video, our platform is designed to help imaginations thrive. Our global headquarters are located in Los Angeles and Singapore, with offices in major cities around the world including New York, London, and Tokyo. We believe that every challenge is an opportunity for growth and innovation, and we are committed to creating an environment where our teams can collaborate and drive impact together. The role of Site Reliability Engineer (SRE) is crucial to our success, as it involves providing support for the deployment and maintenance of our machine learning (ML) systems and platforms. This includes overseeing training, inference, and pipeline orchestration in a production environment, all while working under the guidance of senior-level SREs. The ideal candidate will be responsible for designing and implementing software platforms and infrastructures, ensuring system health through effective monitoring, and developing large-scale distributed ML training and serving systems. In addition to technical responsibilities, the SRE will assist in managing frameworks for efficient, automated, and intelligent service-oriented architecture (SOA) governance. We value sustainable user support, incident response, and conducting blameless postmortems to continuously improve our processes. This position offers an exciting opportunity to be part of a dynamic team that is tackling new challenges and developing innovative solutions in a fast-paced environment.

Responsibilities

  • Provide site reliability engineering support to deploy and maintain the machine learning (ML) system and platform, including training, inference, and pipeline orchestration in the production environment under the guidance of Senior-level SREs.
  • Design and implement software platforms, infrastructures, services and monitor to ensure system health.
  • Develop large-scale distributed ML training/serving system.
  • Assist the team in managing frameworks for efficient, automated, and intelligent service-oriented architecture (SOA) governance.
  • Practice sustainable user support, incident response, and blameless postmortems.

Requirements

  • Must have a Bachelor's degree in Computer Science, Engineering (any), Information Technology, Mathematics, Statistics, Physics, or a related field.
  • 2 years of related work experience, with 1 year of experience in each of the following:
  • Designing web applications using Java, Go, and Python programming languages.
  • Performing Linux administration, including monitoring performance, debugging issues, monitoring network behavior and designing and troubleshooting networked applications using OS networking protocol stack and OS concepts, including virtualization, containerization, and memory management.
  • Designing scalable and high available software systems for cloud environments.
  • Working across all phases of the SDLC, including requirements gathering and analysis, design, development, implementation, testing, deployment, and maintenance of back-end and cloud native projects.
  • Building automation tools and scripts.

Benefits

  • 100% premium coverage for employee medical insurance
  • Approximately 75% premium coverage for dependents
  • Health Savings Account (HSA) with company match
  • Dental insurance
  • Vision insurance
  • Short/Long term Disability insurance
  • Basic Life, Voluntary Life and AD&D insurance plans
  • Flexible Spending Account (FSA) Options
  • 10 paid holidays per year
  • 17 days of Paid Personal Time Off (PPTO)
  • 10 paid sick days per year
  • 12 weeks of paid Parental leave
  • 8 weeks of paid Supplemental Disability
  • Mental and emotional health benefits through EAP and Lyra
  • 401K company match
  • Gym reimbursement
  • Cellphone service reimbursement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service