Staff Site Reliability Engineer

Fivetran - Toronto, SD

posted 3 months ago

Full-time - Mid Level

Toronto, SD

Publishing Industries

About the position

Fivetran is on a mission to simplify and enhance access to data, making it as reliable as electricity. As a Staff Site Reliability Engineer, you will play a crucial role in building and maintaining data pipelines that power the modern data stack for thousands of companies. This position is based in our Toronto office and is a full-time role within the Site Reliability Engineering team. You will collaborate closely with engineering teams, product managers, and support and sales engineers to ensure the reliability and performance of the Fivetran Data Platform. In this role, you will take ownership of the overall performance and reliability of Fivetran's infrastructure. This includes managing the robustness of the deployment pipeline and ensuring timely and effective incident response and resolution. You will be responsible for the growth and stability of the infrastructure, driving effective incident response, and implementing strategies for issue avoidance. Your work will directly impact the reliability of our production infrastructure, ensuring that our services are available, scalable, and secure. You will utilize a variety of technologies including Kubernetes, PostgreSQL, ArgoCD, Terraform, Ansible, Python, Go, Java, AWS, GCP, Azure, Grafana, Buildkite, and Temporal. Your responsibilities will include monitoring the availability, capacity, and throughput of our systems, evolving our infrastructure by integrating reliability into our product roadmap, and coordinating the prioritization and resolution of critical bugs. You will also work closely with the security team to monitor and remedy infrastructure vulnerabilities, ensuring that our systems remain secure and reliable.

Responsibilities

Responsible for ongoing reliability and robustness of Fivetran's production infrastructure by monitoring availability, capacity, and throughput.
Evolve systems by adding reliability into our product roadmap.
Coordinate the re-prioritization or fixing of critical bugs for support or sales requirements as needed.
Make recommendations to production infrastructure by interfacing with engineering to ensure 100% availability.
Ensure scalable artifacts deployment to all environments by automation scripts.
Constantly monitor infrastructure vulnerabilities and remedy them by working with the security team.

Requirements

5+ years of experience working with SaaS products at scale.
Working knowledge of managed Kubernetes (EKS, AKS, and GKE).
Knowledge of Cloud Platforms and related tooling: AWS, Azure, GCP, Terraform, Ansible, Buildkite, Pulumi, and ArgoCD.
Experience in Python/Shell scripting; bonus if you have Java, Go, etc.
Experience with Linux operating systems internals and administration.
Experience with cloud networking like VPNs, Privatelinks, and Private Service connect (GCP).
Experience with databases such as PostgreSQL.

Nice-to-haves

Java programming skills
GoLang programming skills

Staff Site Reliability Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Tools

Career Hubs

Guides

Company