INSPYR Solutions - Phoenix, AZ
posted 3 months ago
The Site Reliability Engineer (SRE) position is a critical role within our IT DevOps team, focusing on ensuring the reliability and performance of our cloud-based services. This role requires a deep understanding of monitoring and alarming processes, as well as experience with cloud service platforms. The SRE will be responsible for standing up new cloud environments, determining system interactions and dependencies within the Voice Services product, and providing on-call support as needed based on a rotation schedule and the severity of issues. The successful candidate will also route defects to the appropriate internal or external teams for remediation and manage problem management policies effectively. Collaboration is key in this role, as the SRE will work closely with monitoring teams to develop new alarms and alerts based on undetected incidents, ensuring that all relevant data triggering the alarms is captured. The SRE will define monitoring types and thresholds to implement, maintain documentation for Mean Time to Repair (MTTR), and participate in weekly calls to stay updated on system changes. Experience with call center platforms, forecasting, and capacity planning of system and network resources is essential, as is the ability to provision resources cost-effectively. The SRE will also need to have a solid understanding of SSL certificates, general networking, and the ability to multi-task and prioritize effectively in a fast-paced environment.