Fidelity Investments - Hoboken, NJ

posted 2 months ago

Full-time - Principal
Hoboken, NJ
Securities, Commodity Contracts, and Other Financial Investments and Related Activities

About the position

As a Principal Site Reliability Engineer at Fidelity Investments, you will be an integral member of the TechOps SRE team, collaborating closely with engineering partners to drive initiatives from design through to implementation. Your primary focus will be on our highly available multi-region Kubernetes environments, specifically utilizing AWS EKS, which are central to our enterprise-grade infrastructure strategy. These environments currently support numerous mission-critical workloads, and in this exciting role, you will have the opportunity to further develop your skills while working in a fun, collaborative, and rapidly changing environment. This position offers a phenomenal opportunity to have a direct impact on the emerging strategies of our infrastructure and deployments, while also helping to enable the expansion of our business. In this role, you will leverage your extensive experience with AWS and Kubernetes to manage and maintain our cloud infrastructure. You will be responsible for crafting and deploying applications to the cloud, promoting a DevOps mentality, and providing mentorship to other engineers. Your technical leadership will be crucial in driving the design of highly available, secure, and scalable microservices-based applications in AWS. You will also work multi-functionally with other organizations, collaborating with risk, product, and engineering team leaders to ensure the reliability and uptime of our applications. The Principal Site Reliability Engineer will also champion automation tools to improve software delivery and reduce risk, while maintaining logging, monitoring, and alerting capabilities using tools like Datadog and Splunk. You will be expected to communicate effectively at all levels, seeing problems as opportunities to automate and improve processes. This role requires a proactive approach to managing infrastructure and a commitment to continuous improvement in our cloud operations.

Responsibilities

  • Collaborate with engineering partners to drive initiatives from design to implementation.
  • Manage and maintain multi-region Kubernetes environments on AWS EKS.
  • Craft and deploy applications to the cloud, promoting a DevOps mentality.
  • Provide technical leadership to teams of Site Reliability Engineers and Cloud Engineers.
  • Champion automation tools to improve software delivery and reduce risk.
  • Maintain logging, monitoring, and alerting capabilities using tools like Datadog and Splunk.
  • Work multi-functionally with other organizations, including risk, product, and engineering teams.
  • Drive the overall design of highly available, secure, and scalable microservices-based applications in AWS.
  • Mentor team members and establish development standard methodologies for AWS infrastructure-as-code.
  • Configure and deploy resilient infrastructure across multiple regions and availability zones.

Requirements

  • 5+ years of hands-on experience with AWS in a production environment.
  • Experience building and deploying Docker images including Docker Compose.
  • Production experience running Kubernetes workloads ideally on AWS EKS.
  • Experience managing and maintaining Kubernetes Clusters on AWS EKS.
  • Experience with Confluent or Kafka.
  • Experience creating and deploying Helm charts & libraries.
  • Hands-on experience with Jenkins Core, including authoring and maintaining declarative CI/CD pipelines and libraries.
  • Experience with monitoring tools e.g., CloudWatch, Datadog & Splunk Cloud.
  • Proficiency with UNIX operating systems and shell scripting.
  • Experience with Amazon Web Services (AWS), managing services and applications in a large AWS cross-account environment using IAM and federated SSO.
  • Ability to communicate at all levels with a track record of strong written and verbal communications.
  • Ability to work independently with minimal direction.
  • Experience with infrastructure-as-code (IaC), Terraform preferred.
  • Programming experience, e.g., Python preferred.

Nice-to-haves

  • Experience with distributed version control systems, Git preferred.
  • Experience with Apache or Confluent Kafka a plus.
  • Experience with the agile software development lifecycle and Kanban preferred.
  • Experience with CDN Providers e.g., Akamai preferred.

Benefits

  • Comprehensive health care coverage and emotional well-being support.
  • Market-leading retirement plans.
  • Generous paid time off and parental leave.
  • Charitable giving employee match program.
  • Educational assistance including student loan repayment and tuition reimbursement.
  • Learning resources to develop your career.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service