Primus Global Services - Sunnyvale, CA
posted 2 months ago
The SRE Engineer position is a long-term opportunity with one of our largest clients, located in either Durham, NC or Sunnyvale, CA. This role is critical for ensuring the reliability and performance of our systems, and it requires a strong background in site reliability engineering principles. The ideal candidate will have extensive experience with various Linux distributions, particularly RHEL and CentOS, and will be adept at using shell scripting, managing filesystems, and utilizing various utilities to maintain system health and performance. In this role, you will be responsible for working with distributed computing systems and container orchestration frameworks, including Kubernetes and Rancher. A solid understanding of Kubernetes objects is essential, as you will be tasked with deploying and managing applications in a cloud-native environment. Additionally, experience with storage solutions, particularly ONTAP, is preferred, as you will be involved in managing volumes, aggregates, backups, and disaster recovery planning. Automation is a key focus of this position, and you will be expected to create and support automation scripts using shell, Ansible, and Python to streamline infrastructure deployments, validations, and monitoring processes. Familiarity with scheduling monitoring scripts using cron and Airflow is also required. You will work with various monitoring tools such as Dynatrace, Apica, and Grafana to ensure system performance and reliability. A good understanding of both SQL and NoSQL databases is necessary, as is experience in building CI/CD pipelines, particularly in cloud environments like AWS. Incident handling and problem management will also be part of your responsibilities, ensuring that any issues are resolved promptly and effectively.