The Judge Group - Jersey City, NJ

posted 2 months ago

Full-time - Mid Level
Jersey City, NJ
Administrative and Support Services

About the position

The Judge Group Inc. is seeking a highly skilled Site Reliability Engineer (SRE) to join our team in Jersey City, NJ. This role is pivotal in ensuring the reliability, availability, and performance of our infrastructure and services. The ideal candidate will have a strong background in software development and systems engineering, with a focus on automating and optimizing our infrastructure processes. As an SRE, you will be responsible for developing software tooling for programmable infrastructure, driving end-to-end monitoring and management of microservices, and implementing Kubernetes compliance and standard processes. You will also create a self-service console for infrastructure visibility and automate tasks using cutting-edge technologies and best practices. In this role, you will manage the availability, scalability, and performance of the platform's infrastructure, converting application development bottlenecks into opportunities for automation. You will build and maintain CI/CD environments to scale SaaS applications across multi-region and multi-cloud patterns. Your expertise in core Enterprise LINUX, container management, and cloud services will be essential in driving our infrastructure initiatives forward. This position requires a proactive approach to problem-solving and a commitment to continuous improvement in our systems and processes.

Responsibilities

  • Develop full-fledged software tooling for programmable infrastructure (infrastructure as code).
  • Drive end-to-end microservices monitoring and management.
  • Implement Kubernetes compliance and standard processes (security, audits, network policies).
  • Create a self-service console for infrastructure visibility.
  • Automate tasks using cutting-edge technologies and standard methodologies.
  • Manage availability, scalability, and performance of the platform's infrastructure.
  • Convert application development bottlenecks into opportunities for automation.
  • Build and maintain CI/CD environments for scaling SaaS applications across multi-region and multi-cloud patterns.

Requirements

  • Strong knowledge of core Enterprise LINUX (Red Hat/CentOS).
  • Experience with container management (Kubernetes, Helm, Docker).
  • Proficiency in Amazon Web Services (AWS).
  • Strong programming skills in Python, GO, Ansible, and Terraform.
  • Familiarity with monitoring tools (Grafana, Prometheus, Kibana) and incident management.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service