Bytedance - San Jose, CA

posted 3 days ago

Full-time - Mid Level
San Jose, CA
Professional, Scientific, and Technical Services

About the position

As a Site Reliability Engineer at CapCut, you will play a crucial role in ensuring the stability and performance of our large-scale systems. Your primary responsibility will be to design and develop solutions that automate technical operations, working closely with various teams to enhance system stability from a Software Development Lifecycle perspective. You will be tasked with strengthening the stability of CapCut systems, which includes monitoring, logging, dashboard creation, and developing diagnostic tools. Conducting regular drills and creating remedy plans will be essential to achieve rapid service restoration, and you will be expected to take shifts to respond to production issues across different regions. In addition to operational responsibilities, you will define key performance indicators to evaluate system performance and runtime, improving observability and facilitating the system development and troubleshooting processes. You will also be involved in planning system capacities in line with business expansion and scheduled promotions. This position requires a proactive approach to problem-solving and a strong sense of ownership, as you will be addressing system issues and collaborating with teams to implement effective solutions. At CapCut, we are committed to fostering a culture of creativity and innovation. Our team is passionate about learning and taking on challenges, and we value good ideas that drive impact for our users. As part of a young and dynamic team, you will have the opportunity to contribute to the development of cutting-edge AI technology that enhances content creation while ensuring user privacy and data security.

Responsibilities

  • Design and develop solutions to automate the technical operations of large-scale systems.
  • Work closely with teams to improve stability from a Software Development Lifecycle perspective.
  • Strengthen CapCut systems' stability, including monitoring, logs, dashboards, and diagnostic tools.
  • Conduct regular drills and develop remedy plans for fast service restoration.
  • Take shifts to respond to production issues across regions.
  • Define indicators to evaluate system performance and runtime to improve observability.
  • Plan system capacities according to business expansion and scheduled promotions.

Requirements

  • Bachelor's or higher degree in Computer Science or related technical discipline.
  • 2-5 years of working experience in the Internet industry.
  • Solid knowledge of Computer Science principles, including Operating Systems, Computer Storage, and Computer Networking.
  • Software development experience in at least one programming language (Java, Go, C++, Python, JS).
  • Strong ability to resolve system problems, good communication skills, and a sense of ownership.

Nice-to-haves

  • Experience with Redis, MySQL, Nginx, Kubernetes, Docker.

Benefits

  • 100% premium coverage for employee medical insurance, approximately 75% for dependents.
  • Health Savings Account (HSA) with company match.
  • Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life, and AD&D insurance plans.
  • Flexible Spending Account (FSA) options for healthcare and dependent care.
  • 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO).
  • 10 paid sick days per year.
  • 12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.
  • Mental and emotional health benefits through EAP and Lyra.
  • 401K company match.
  • Gym and cellphone service reimbursements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service