Splunk
posted 3 months ago
Splunk is dedicated to building a safer and more resilient digital world, and as a Site Reliability Engineer (SRE) early in your career, you will play a crucial role in this mission. The Cloud organization at Splunk is focused on developing and maintaining robust platform solutions for the Software as a Service (SaaS) hosting of Splunk's enterprise software. This position is part of the TechOps team, which is responsible for monitoring and resolving issues that affect the availability and performance of Splunk for our cloud customers around the clock. As a member of this team, you will be the authority on customer experience, providing support and guidance to ensure that all technical issues are addressed promptly and effectively. In this role, you will work on a 4 x 10 shift schedule from Wednesday to Saturday, 4 PM to 2 AM. Your primary responsibilities will include providing technical support for the Splunk Cloud fleet, performing impact assessments, documenting issues and remediation steps, and leading support cases. You will also communicate with other TechOps engineers and business partners, assist with complex tasks, and represent the TechOps team in meetings to recommend new procedures and processes. Your ability to restore normal service operations quickly during escalated incidents will be vital in minimizing the impact on business operations. You will thrive in this position if you have a passion for large complex systems and enjoy working on distributed systems. You will be expected to ask critical questions about automation and data-driven decision-making, ensuring that issues are resolved before they affect customers. This is a fully remote position, and candidates must be U.S. citizens working on U.S. soil to be considered, with the ability to support FedRAMP High requirements.