The position involves identifying, designing, and implementing changes to existing services to enhance reliability, performance, and standardization across all AWS services, microservices, and serverless services. The role requires the ability to proactively identify inefficient resource utilization and remediate resources to improve platform stability and cost efficiency. You will integrate with product development teams to understand their services and support them while working on SRE-related platforms. Troubleshooting production issues, providing root cause analysis, and designing solutions to prevent future occurrences are key responsibilities. Additionally, you will plan and test for capacity growth, monitor services, create intelligent alarming for quicker incident detection and resolution, and build automations and internal tools to improve processes. Hands-on experience with Datadog and Splunk, as well as proficiency in Python, .NET, and Java, is essential.