Cointelegraph - San Francisco, CA
posted 5 months ago
The Site Reliability Engineer (SRE) role is pivotal in ensuring the reliability and performance of our services throughout their lifecycle. This position involves engaging in and improving the entire lifecycle of services, from inception and design through deployment, operation, and refinement. As an SRE, you will embed with engineering teams to apply industry best practices, ensuring that our systems, infrastructure, and applications are built and managed through automation. You will support services before they go live by participating in activities such as system design consulting, developing software platforms and frameworks, capacity planning, and conducting launch reviews. Once services are live, you will maintain them by measuring and monitoring their availability, latency, and overall system health. Your responsibilities will also include scaling systems sustainably through mechanisms like automation and evolving systems by advocating for changes that enhance reliability and velocity. You will practice sustainable incident response and conduct blameless postmortems to learn from incidents. Together with your engineering team, you will share an on-call rotation and serve as an escalation contact for service incidents, ensuring that we maintain high service levels and quickly address any issues that arise.