The Judge Group - Jersey City, NJ

posted 16 days ago

Full-time
Jersey City, NJ
Administrative and Support Services

About the position

The Site Reliability Engineer (SRE) at The Judge Group Inc. is responsible for developing and managing software tooling for programmable infrastructure, ensuring the reliability and performance of the platform's infrastructure. This role involves automating tasks, implementing monitoring solutions, and managing CI/CD environments to support scalable SaaS applications across various cloud environments.

Responsibilities

  • Develop full-fledged software tooling for programmable infrastructure (infrastructure as code).
  • Drive end-to-end microservices monitoring and management.
  • Implement Kubernetes compliance and standard processes (security, audits, network policies).
  • Create a self-service console for infrastructure visibility.
  • Automate tasks using cutting-edge technologies and standard methodologies.
  • Manage availability, scalability, and performance of the platform's infrastructure.
  • Convert application development bottlenecks into opportunities for automation.
  • Build and maintain CI/CD environments for scaling SaaS applications across multi-region and multi-cloud patterns.

Requirements

  • 8 years of experience in a related field.
  • Strong knowledge of core Enterprise LINUX (Red Hat/CentOS).
  • Experience with container management (Kubernetes, Helm, Docker).
  • Proficiency in Amazon Web Services (AWS).
  • Strong programming skills in Python, GO, Ansible, and Terraform.
  • Familiarity with monitoring tools (Grafana, Prometheus, Kibana) and incident management.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service