Bytedance - Bellevue, WA
posted 2 months ago
As a Senior Site Reliability Engineer at ByteDance, you will play a crucial role in enhancing the lifecycle of our infrastructure services. This includes everything from the initial design and development phases to capacity planning, launch reviews, deployment, operation, and ongoing refinement of our systems. You will be responsible for designing and implementing software platforms and monitoring frameworks that support efficient, automated, and intelligent governance of our service-oriented architecture (SOA). Your work will directly contribute to scaling our systems sustainably through automation, while also evolving the reliability, efficiency, and velocity of our services by advocating for necessary changes. In this position, you will maintain services to meet our service-level agreements (SLAs) and service-level objectives (SLOs) by continuously measuring and monitoring the availability, performance, and overall health of our systems. You will also provide user support, respond to incidents, and conduct postmortems to analyze and learn from any issues that arise. Participation in technical operations and rotations will be expected, particularly in response to performance and reliability challenges. Additionally, you will have the opportunity to mentor junior Site Reliability Engineers and interns, fostering their growth and development within the team.