Delivers services at high scale, high availability with resilience by using automation and Infrastructure Code. Builds reliability into ecosystem by applying best practices in Resiliency Engineering, Automation, Observability, and Chaos Testing. Manages systems using infrastructure as code tools (IAM, ARM, Terraform, and Chef). Utilizes modern monitoring tools (Datadog, Prometheus, and Splunk). Automates with various scripting languages - Python and Shell scripting. Helps teams scale through production insights, operational automation, developer guidance, real-time metrics, and automation.