This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Capital One - Washington, DC

posted 2 months ago

Full-time - Senior
Washington, DC
Credit Intermediation and Related Activities

About the position

The Sr. Distinguished Engineer - Platform Operations at Capital One is a leadership role focused on optimizing and managing the Machine Learning and Artificial Intelligence platform operations. This position is pivotal in setting strategic roadmaps, overseeing day-to-day management, and ensuring the platform's performance aligns with industry best practices. The role emphasizes collaboration, diversity, and innovation, aiming to create a culture where all associates can thrive and contribute their unique perspectives.

Responsibilities

  • Set the roadmap and oversee the management of AI and ML platforms, including strategies for container management in public cloud (AWS) and cloud resource provisioning.
  • Maintain a deep understanding of the technical aspects of the platform, including infrastructure, algorithms, APIs, and integrations.
  • Provide operations leadership to engineering and production teams.
  • Implement robust processes and operations dashboards to monitor platform performance, user feedback, and adherence to service level agreements (SLAs).
  • Collaborate with cyber, technology risk management, security, and compliance teams to understand company requirements.
  • Work closely with product and engineering teams to ensure adherence to industry best practices and corporate standards.
  • Implement automation and dashboards to visualize vulnerabilities and platform incidents for proactive decision-making.
  • Develop a long-term vision and roadmap for platform operations enhancements in collaboration with executive leadership.
  • Build a high-performing operations team, recruiting and retaining world-class engineers.

Requirements

  • Bachelor's Degree
  • At least 9 years of experience managing platform, infrastructure operations, or Site Reliability Engineering.
  • At least 5 years of experience with public cloud technologies.

Nice-to-haves

  • Master's Degree in a STEM field (Science, Technology, Engineering, or Mathematics)
  • 5+ years of experience in managing large-scale, high-performance, distributed systems as a Site Reliability Engineer or product engineer.
  • 5+ years of experience in setting up and scaling observability platforms and creating operational health dashboards.
  • 3+ years of experience in building systems within a regulated environment.
  • 3+ years of experience in Artificial Intelligence, Machine Learning, or Cloud infrastructure.
  • 3+ years of experience with managing distributed systems, multi-tenant, microservices, and container orchestration (Kubernetes).
  • 5+ years of experience with the machine learning lifecycle.

Benefits

  • Comprehensive health benefits
  • Financial benefits including performance-based incentives
  • Inclusive workplace culture
  • Opportunities for professional development
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service