Cisco - San Francisco, CA
posted 3 months ago
The Principal Site Reliability Engineer for Datastores at ThousandEyes will play a crucial role in ensuring the reliability and performance of the platform's mission-critical datastores, which include technologies such as ElasticSearch, Kafka, MongoDB, and MySQL. This position is focused on all aspects of datastore reliability, including availability, performance, change management, capacity planning, monitoring, and emergency response. As a leader in this role, you will be responsible for innovating and providing a strong technical vision while collaborating with various teams to build reliable, scalable, and highly available datastores on a multi-region scale platform. In this role, you will partner with leaders across ThousandEyes as a subject matter expert in datastores, helping to design optimal architectures and processes. You will also serve as a role model for the engineering team, promoting effective delivery and teamwork. The position requires a reliability-focused engineering leader who is passionate about automation and operational excellence, particularly in the context of managing ever-growing volumes of data. The ideal candidate will possess deep knowledge of datastores, with experience in building and supporting mission-critical systems. You will be expected to ensure that the ThousandEyes platform's services utilize the appropriate datastore infrastructure, designed and optimized for availability, latency, and performance. Strong technical vision and the ability to communicate effectively with various stakeholders are essential, as is a hands-on approach to writing software and automating processes to enhance the reliability of the datastores. Additionally, you will be expected to mentor and uplift the team, fostering a culture of learning and collaboration.