Capital One - Richmond, VA
posted 4 months ago
Capital One Technology Operations has an opening for a DevOps Platform Engineer who has the passion to provide superior system availability and customer experience. You must possess strong knowledge of AWS Infrastructure, automation, and Python scripting, and be able to apply this knowledge to manage cloud-based applications. Our ideal candidate will be responsible for driving efficiencies through automation, and providing technical guidance and leadership internally, as well as partnering with adjacent teams to drive solutions. You will drive reliability and performance across a massive scale by mastering the full depth of the stack. You should have experience with, and a strong knowledge of, Incident, Problem and Change management processes. You will drive technical resolution of High-Severity & Low Severity Incidents. In this role, you will design, deploy and support automation and scripting solutions to drive new capabilities, visibility, and efficiency. You will influence resiliency and scalability in production environments in Amazon Web Services (AWS). Identifying opportunities and developing proactive automated monitoring and alerting solutions by utilizing available tools (Splunk, New Relic, etc.) will be key to your success. You will provide technical leadership and guidance around DevOps best practices, and identify opportunities to reduce manual validation efforts, driving zero touch automation and self-healing. Proactively monitoring all of the applications and infrastructure behind Capital One's external and internal customer-facing services will be part of your responsibilities, including their availability, latency, performance, and capacity. You will manage and govern relationships with technology vendors utilized in the enterprise and create, manage and utilize appropriate technical procedural documentation (run books). Ongoing on-the-job training and self-study professional development will be encouraged, and you will serve as an Operations Tier 3 escalation resource for incident resolution.