Qualys - Raleigh, NC
posted 3 months ago
As a Site Reliability Engineer for the Cloud Platform, you will play a crucial role in the full lifecycle development of cloud platform services. This includes everything from inception and design to deployment, operation, and continuous improvement of these services. Your work will be performed in FedRAMP environments, which necessitates that you are a U.S. Person, including U.S. citizens, nationals, lawful permanent residents, asylees, or refugees. You may also be required to perform work that is restricted to U.S. citizens on U.S. soil. In this position, you will focus on increasing the effectiveness, reliability, and performance of cloud platform technologies. This will involve identifying and measuring key performance indicators, making automated changes to production systems, and evaluating the results of those changes. You will support the cloud platform team by engaging in system design, capacity planning, and automation of key deployments. Additionally, you will help build a strategy for production monitoring and alerting, and participate in the testing and verification processes. Your responsibilities will also include ensuring that cloud platform technologies are properly maintained by monitoring availability, latency, performance, and overall system health. You will advise the cloud platform team on improving system reliability and scaling based on demand. As part of the development process, you will support new features, services, and releases, taking ownership of the cloud platform technologies. You will develop tools and automate processes for large-scale provisioning and deployment of these technologies. Participation in an on-call rotation is expected, where you will lead incident response efforts and contribute to writing detailed postmortem analysis reports that are candid and constructive. You will also propose improvements and drive efficiencies in systems and processes related to capacity planning, configuration management, scaling services, performance tuning, monitoring, alerting, and root cause analysis.