The Site Reliability Engineer (SRE) position is a critical role that focuses on ensuring the reliability, availability, and performance of our infrastructure and services. The SRE will be responsible for creating and supporting automation scripts using shell, Ansible, and Python to facilitate infrastructure deployments, validations, and monitoring. This role is essential in improving operational tasks and enhancing the overall efficiency of our systems. The SRE will also be tasked with scheduling monitoring scripts using cron and Airflow, ensuring that our systems are continuously monitored and any issues are promptly addressed. In addition to automation and monitoring, the SRE will handle incident management and problem resolution, working closely with various teams to troubleshoot and resolve issues as they arise. The role requires extensive experience in IT infrastructure, particularly with Linux operating systems such as RHEL and CentOS, as well as a strong understanding of distributed computing and container orchestration frameworks, including Kubernetes. The SRE will also be involved in database management, requiring knowledge of both SQL and NoSQL databases. The ideal candidate will have a strong background in building CI/CD pipelines and will be familiar with cloud platforms, specifically AWS. This position offers a hybrid work environment, allowing for two days in the San Jose, CA office and three days of remote work, providing flexibility while maintaining collaboration with the team.