Senior or Staff Site Reliability Engineer - Performance Engineering

Unclassified - Phoenix, AZ

posted 5 months ago

Full-time - Senior

Phoenix, AZ

About the position

As a Senior Site Reliability Engineer at Circle, you will play a crucial role in designing, building, and maintaining the infrastructure that supports Circle's growing worldwide customer base. This position requires a deep understanding of public cloud providers and the ability to ensure that Circle's products and core systems operate consistently and performantly. You will be part of a dynamic and fast-paced environment where collaboration with cross-functional teams is essential. Your expertise will help in delivering exceptional customer experiences while continuously learning and developing your skills. In this role, you will support multiple development teams by providing an agile and responsive CI/CD platform that enables high-quality builds with measurable performance and quality. You will be responsible for building, maintaining, improving, scaling, and securing cloud infrastructure and resources using Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, and Ansible. Automation will be a key focus, as you will automate operational tasks using programming languages like Go and Python, as well as serverless solutions like AWS Lambda and Kubernetes Jobs. You will also design, manage, and monitor Kubernetes clusters for various production workloads, and contribute to the development of Circle's blockchain infrastructure by creating and managing blockchain nodes across multiple blockchains, including Algorand, Ethereum, and Solana. Participation in an on-call rotation will be required to mitigate disruptions in production systems, along with conducting root cause analysis when issues arise. Additionally, you will plan and test disaster recovery scenarios for a highly available microservices architecture and collaborate with the Security team to maintain a strong security posture. Mentoring and engaging with team members will be an important aspect of your role, as you help grow and scale the team. This position offers a unique opportunity to work in a collaborative and innovative environment, making a significant impact on the company's infrastructure and customer experience.

Responsibilities

Support multiple development teams with an agile, responsive CI/CD platform to deliver high-quality builds with measurable performance and quality
Build, maintain, improve, scale, and secure cloud infrastructure and resources using IaC tools (Terraform, CloudFormation, Ansible)
Automate operational tasks via Go, Python, and serverless solutions (AWS Lambda, Kubernetes Jobs)
Design, manage, and monitor Kubernetes clusters for multiple production workloads
Drive forward blockchain infrastructure by creating and managing blockchain nodes across various blockchains (Algorand, Ethereum, Hedera, Flow, Solana, Stellar)
Participate in an on-call rotation to mitigate disruption for any production systems and conduct root cause analysis
Plan and test disaster recovery scenarios for a highly available microservices architecture
Collaborate with the Security team to create and maintain security-focused tools and frameworks
Engage and mentor team members and help grow and scale the team

Requirements

4+ years in DevOps or SRE roles, with a focus on tooling, automation, and infrastructure on a major public cloud provider
Proficiency with coding and/or scripting in Go, Python, and Shell
At least 3 years of combined experience in building and maintaining CI/CD platforms and supporting agile engineering teams in building microservices
Experience with building Docker images and deploying containers in Kubernetes clusters
Familiarity with modern CI/CD platforms with complex gates and workflows
Knowledge of Blue-Green, Canary, and A/B Testing deployment strategies
Experience with distributed blockchain systems and maintaining blockchain full nodes
Familiarity with database technologies (PostgreSQL, Redis, OpenSearch)
Experience in migrating and transforming large, complex datasets from diverse sources
Knowledge of data warehousing tooling and services (Apache Airflow, AWS DMS, Snowflake)
Understanding of networking routing, DNS, load balancing, and edge networking
Familiarity with APM, RUM, monitoring, and telemetry tools
Experience with Helm charts and maintaining Kubernetes clusters
Ability to author and maintain IaC with Terraform and deploy resources in public cloud providers (AWS, Azure, GCP)
Strong skills in observability, troubleshooting, and performance solutions
Excellent communication skills and ability to explain technical concepts to peers and stakeholders

Nice-to-haves

7+ years in DevOps or SRE roles, with a focus on tooling, automation, and infrastructure on a major public cloud provider
Experience leading teams technically on architecture and system design
Deep understanding of API design and REST principles
Experience with cloud services (AWS, Google Cloud, Microsoft Azure)
Familiarity with containers and Kubernetes
Strong focus on coding standards and code quality with a desire for excellent test coverage

Senior or Staff Site Reliability Engineer - Performance Engineering

About the position

Responsibilities

Requirements

Nice-to-haves

Tools

Career Hubs

Guides

Company