This job is closed
We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.
Monitors and analyzes performance metrics and application logs by leveraging application server technologies -- Tomcat, Node, or Apache. Works with the latest performance testing tools -- LoadRunner, CloudTest, Datadog, Grafana and JMeter. Supports testing efforts across multiple business units supported by Enterprise Infrastructure (EI) to deliver services at high scale, high availability with resilience by using automation and Infrastructure Code. Builds ecosystem reliability by applying best practices in Resiliency Engineering, Automation, Observability, and Chaos Testing. Defines and executes a comprehensive reliability and observability strategy, ensuring systems are always available when customers need them across the enterprise. Ensures platforms support and can scale to meet the needs of multiple business units. Coordinates systems using infrastructure code tools (IAM, ARM, Terraform, and Chef). Builds, operates, monitors, logs, and alerts services of distributed systems at scale. Implements advanced observability practices and techniques at scale. Configures dashboards using Datadog, Splunk, Grafana and Prometheus to identify system resource utilization and for all BPM metrics.