Senior Site Reliability Engineer

$181,272 - $195,000/Yr

Unclassified - San Ramon, CA

posted 5 months ago

Full-time - Mid Level

San Ramon, CA

About the position

We are actively seeking a Senior Site Reliability Engineer to join our dynamic and innovative engineering team. This position offers the flexibility of remote work and may involve travel or relocation to various anticipated and unanticipated locations throughout the United States, including client sites and our headquarters in San Ramon, CA. As a Senior Site Reliability Engineer, you will play a crucial role in ensuring the reliability and performance of our production environments. You will be responsible for designing and constructing automation tools and systems that streamline the software deployment process, thereby enhancing our operational efficiency. In this role, you will monitor production environments, promptly identifying and resolving issues to maintain high system reliability. You will utilize Terraform to define and manage cloud resources in a version-controlled and automated manner, and develop and maintain automation scripts to facilitate deployment, configuration management, scaling, and other operational tasks using Amazon Web Services (AWS). Setting up CI/CD pipelines using Jenkins will be a key responsibility, ensuring that our software deployments are automated and consistent. You will also analyze and investigate production issues and incidents, elaborating steps to reduce risks and improve application quality metrics. Implementing incident response methodologies and developing automation workflows for critical systems will be essential to maintaining high system reliability. Collaboration with cross-functional teams, including developers, QA engineers, and stakeholders, will be vital as you address issues, enhance functionality, implement improvements, and identify and address performance bottlenecks. You will work with a variety of technologies, including AWS services, Jenkins, Terraform, Solr, Kubernetes, Datadog, and Prometheus, contributing to our commitment to a customer-centric culture and innovation.

Responsibilities

Design and construct automation tools and systems to streamline software deployment process.
Monitor production environments and promptly identify and resolve issues.
Utilize Terraform to define and manage cloud resources in a version-controlled and automated manner.
Develop and maintain automation scripts to streamline deployment, configuration management, scaling, and other operational tasks using Amazon Web Services (AWS).
Set up CI/CD pipelines using Jenkins to ensure automated and consistent software deployments.
Analyze and investigate production issues and incidents, elaborating steps to reduce risks and improve application quality metrics.
Implement incident response methodologies and develop automation workflows for critical systems to maintain high system reliability.
Collaborate with cross-functional teams, developers, QA engineers, and stakeholders to address issues, enhance functionality, implement improvements, and identify and address performance bottlenecks.

Requirements

Master of Science in Computer Science, Computer Engineering, or closely related field.
At least TWO (2) years of experience in the job offered or at least TWO (2) years of experience in System Administration, Automation and Infrastructure as Code, Amazon Web Services (AWS), Terraform, Datadog, Containerization and orchestration using Jenkins, Monitoring and Alerting, Configuration Management.

Benefits

Opportunity to work on bleeding-edge projects
Competitive salary
Flexible schedule
Medical insurance
Benefits program
Social package - medical insurance, sports
Corporate social events
Professional development opportunities

Senior Site Reliability Engineer

About the position

Responsibilities

Requirements

Benefits

Tools

Career Hubs

Guides

Company