Elliott Moss Consulting - San Jose, CA
posted 2 months ago
We are seeking a highly skilled DevOps Engineer / Site Reliability Engineer (SRE) with expertise in managing complex environments and a deep understanding of Linux-based systems, Kubernetes, automation, and cloud platforms. The ideal candidate will have strong experience in deploying, managing, and troubleshooting large-scale applications with a focus on automation, monitoring, and cloud services. This position is based in San Jose, CA and requires onsite presence. The role is expected to last for a duration of 12 months, and we welcome candidates with various visa statuses, excluding OPT and CPT. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our systems and applications. You will work closely with development teams to implement best practices in automation and monitoring, and you will be involved in the deployment and management of applications in a cloud environment. Your expertise in Kubernetes and cloud platforms will be crucial in maintaining the health of our infrastructure and ensuring that our services are available and performant. The ideal candidate will also have a solid understanding of storage technologies, particularly NetApp ONTAP, and will be proficient in scripting languages such as Shell, Ansible, and Python. You will utilize monitoring and performance tools like Dynatrace, Apica, and Grafana to ensure that our systems are operating optimally. Familiarity with CI/CD pipelines and DevOps tools such as Jenkins and GitLab CI will also be essential in this role. Strong problem-solving skills and the ability to troubleshoot complex systems are a must, as you will be tasked with resolving issues that may arise in our production environment.