This job is closed
We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.
The position involves building and operating highly resilient platforms in AWS cloud environments. The role requires coordination of systems using Infrastructure as Code tools such as IAM, ARM, Terraform, and Chef. The candidate will perform reliability engineering throughout the entire Software Development Lifecycle (SDLC) using programming languages like Python, NodeJS, or Java. Responsibilities include deploying and supporting distributed multi-tiered application systems using Kubernetes and Continuous Integration/Continuous Deployment (CI/CD) pipelines. The role also involves creating dashboards to capture latency, availability, error, and saturation performance of applications using tools like Splunk, Grafana, Prometheus, Catchpoint, and Datadog. Additionally, the candidate will create Service-Level Indicator/Service-Level Objective (SLI/SLO) dashboards and automated processes for updates and new dashboard creation. The position requires identifying and resolving application issues using DataDog, Prometheus, and Splunk, as well as creating, maintaining, and tuning monitors using ELK, OpenSearch, and OpenTelemetry. The candidate will support applications hosted in AWS Cloud and Kubernetes, and build, deploy, automate, and support application services across multiple technology platforms, frameworks, and languages.
A Smarter and Faster Way to Build Your Resume