Unclassified - San Ramon, CA
posted 4 months ago
We are actively seeking a Senior Site Reliability Engineer to join our dynamic and innovative engineering team. This position offers the flexibility of remote work and may involve travel or relocation to various anticipated and unanticipated locations throughout the United States, including client sites and our headquarters in San Ramon, CA. As a Senior Site Reliability Engineer, you will play a crucial role in ensuring the reliability and performance of our production environments. You will be responsible for designing and constructing automation tools and systems that streamline the software deployment process, thereby enhancing our operational efficiency. In this role, you will monitor production environments, promptly identifying and resolving issues to maintain high system reliability. You will utilize Terraform to define and manage cloud resources in a version-controlled and automated manner, and develop and maintain automation scripts to facilitate deployment, configuration management, scaling, and other operational tasks using Amazon Web Services (AWS). Setting up CI/CD pipelines using Jenkins will be a key responsibility, ensuring that our software deployments are automated and consistent. You will also analyze and investigate production issues and incidents, elaborating steps to reduce risks and improve application quality metrics. Implementing incident response methodologies and developing automation workflows for critical systems will be essential to maintaining high system reliability. Collaboration with cross-functional teams, including developers, QA engineers, and stakeholders, will be vital as you address issues, enhance functionality, implement improvements, and identify and address performance bottlenecks. You will work with a variety of technologies, including AWS services, Jenkins, Terraform, Solr, Kubernetes, Datadog, and Prometheus, contributing to our commitment to a customer-centric culture and innovation.