Randstad - Merrimack, NH
posted 4 months ago
As a Site Reliability Engineer (SRE), you will be at the forefront of ensuring the reliability and performance of our large-scale, distributed systems. This role combines software engineering and systems engineering principles to build and maintain fault-tolerant systems that can handle massive workloads. You will be responsible for managing Kubernetes clusters, particularly with Amazon EKS, and will need to demonstrate strong troubleshooting skills in this area. Your expertise in Python programming and API development will be crucial as you work to enhance our systems and automate processes. In this position, you will also engage with various monitoring and data visualization tools such as Datadog, Splunk, ELK, Prometheus, and Grafana. Your hands-on experience with these tools will help in log aggregation, monitoring, and alerting, ensuring that our systems are always performing optimally. You will be expected to implement AWS products and services effectively, utilizing infrastructure as code tools like CloudFormation and Terraform to manage our cloud resources efficiently. Collaboration is key in this role, as you will be working within a globally distributed team. Strong communication skills, both written and oral, are essential to convey complex technical information clearly. You will also be expected to contribute to a DevOps culture, embracing agile methodologies and continuously learning new technologies and practices to improve our systems and processes. This position is a contract role based in either Merrimack, NH, or Westlake, TX, with a work schedule from 9 AM to 5 PM.