Site Reliability Engineer

$140,000 - $165,000/Yr

Alloy - New York, NY

posted 2 months ago

Full-time - Mid Level
New York, NY
Personal and Laundry Services

About the position

Alloy is seeking a Site Reliability Engineer (SRE) to join our Infrastructure Team in New York City. This role is pivotal in ensuring that our services, which are relied upon by leading fintechs and top-tier banks, maintain high uptime and exceed our Service Level Objectives (SLOs). As part of a team of five engineers, you will report to the Engineering Manager of Infrastructure and will be responsible for architecting and building infrastructure solutions that enhance our operational reliability. Your work will involve provisioning and managing a variety of AWS resources using Terraform, implementing solutions for deploying applications to Kubernetes in production, and helping to architect secure and reliable systems and deployment pipelines. In this role, you will be expected to write and review code comfortably, apply pragmatic thinking to justify decisions on building versus buying solutions, and continuously seek opportunities to improve our infrastructure. You will utilize tools like Datadog, Splunk, or New Relic to identify latency issues in distributed systems and propose solutions to mitigate them. Participation in on-call rotations is part of the job, but your focus will be on building resilient and self-healing systems to minimize alerts. You will also be responsible for writing infrastructure as code (IAC) using Terraform, automating processes with AWS Tools, GitHub Actions, and custom scripts, and supporting application developers by eliminating constraints in the deployment pipeline. Continuous improvement will be a key focus, as you will look for ways to enhance uptime, autoscaling, and recovery times while suggesting new cloud services and optimizing costs. Your contributions will be crucial in maintaining a high standard of service delivery and operational excellence at Alloy.

Responsibilities

  • Architect and build infrastructure solutions to improve uptime and exceed SLOs.
  • Provision and manage AWS resources using Terraform.
  • Implement solutions for deploying applications to Kubernetes in production.
  • Help architect and build secure and reliable systems and deployment pipelines.
  • Write and review code as part of the development process.
  • Utilize tools like Datadog, Splunk, or New Relic to identify and resolve latency issues in distributed systems.
  • Participate in on-call rotations, focusing on building resilient and self-healing systems.
  • Write infrastructure as code (IAC) using Terraform.
  • Automate releases, deployments, and scaling events using AWS Tools and GitHub Actions.
  • Support application developers by maintaining dev-prod parity and ensuring quick, predictable deployments.
  • Continuously seek opportunities to improve uptime, autoscaling, and recovery times.

Requirements

  • 4+ years of experience working on a DevOps, SRE, or Infrastructure team.
  • Experience writing Infrastructure as Code (IAC).
  • Experience running and troubleshooting applications in Docker.
  • Solid experience with CI/CD tools like GitHub Actions, CircleCI, or Travis.
  • Experience configuring and using troubleshooting tools like Datadog, Cloudwatch, or ELK/EFK.
  • Experience being in an on-call rotation.
  • Practical experience scripting in languages such as bash or python.
  • Programming experience with one or more languages such as python, javascript, or golang.
  • Experience provisioning production resources in AWS, GCP, or Azure using Terraform, CloudFormation, or Ansible.

Nice-to-haves

  • Software engineering experience is a plus.
  • Enthusiasm for a hybrid work schedule, as in-person attendance is required on Tuesdays and Thursdays.

Benefits

  • Unlimited PTO and flexible work policy.
  • Medical, dental, and vision plans with HSA and FSA options.
  • 401k with 100% match up to 4% of annual employee compensation.
  • 16 weeks of paid parental leave for eligible new parents.
  • Home office stipend for new employees.
  • Annual stipend for Learning & Development.
  • Well-being benefits including access to OneMedical and Headspace.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service