Site Reliability Engineer

$135,000 - $350,000/Yr

Silver Valley Metals Corporation, site: Bunker Hil - Palo Alto, CA

posted 12 days ago

Full-time - Mid Level
Palo Alto, CA

About the position

As an Infrastructure Engineer at Alchemy, you will play a crucial role in enhancing the reliability and productivity of our globally used developer platform. Your primary focus will be on designing, deploying, and continuously improving the infrastructure that supports our engineering teams. This position emphasizes the importance of reliability, tooling, and best practices to ensure that our products are delivered efficiently and effectively to our customers.

Responsibilities

  • Set high standards for Reliability at Alchemy
  • Develop and own company-wide Reliability best practices like SLO definition, incident management, postmortem reviews, launch readiness reviews, change management
  • Architect production infrastructure and tools that encourage and enforce high reliability
  • Inspire the broader engineering organization to ensure Reliability is a first-class citizen in the products we build
  • Collaborate, partner, advise, review and mentor engineering teams on Reliability topics like high reliability architecture, observability, safe change management
  • Improve critical infrastructure and systems that are used to operate infrastructure at scale (i.e. compute, networking, deployment, observability, code tooling/libraries etc.)
  • Develop and own best practices for managing production infrastructure: provisioning, application scaling, configuration management, capacity planning, monitoring, etc.
  • Develop and own best practices for developer processes: CI/CD, dev and staging environments, etc.
  • Provide input into long-term platform requirements and operational guidelines with a focus on reliability
  • Continuously raise our standard of engineering excellence by implementing best practices for coding, testing, and deployment
  • Build and maintain documentation around process and workflows

Requirements

  • 6+ years of experience as an Infrastructure Engineer focused on Reliability (e.g., Site Reliability Engineer, Production Engineer, Platform Engineer)
  • Experience leading and driving company-wide reliability efforts and engineering initiatives
  • Experience with observability best practices and tooling like Prometheus, Grafana and Datadog
  • Experience designing and operating large-scale, multi-region production systems
  • Experience working with AWS or other cloud infrastructures
  • Experience with container schedules and runtimes such as Docker and Kubernetes
  • Experience building deployment pipelines leveraging common CI/CD tools (e.g. Argo, Flux, Gitops)
  • Experience with Infrastructure-as-Code (e.g. Terraform, Pulumi, Chef, Puppet, etc)
  • Strong communication and collaboration skills

Nice-to-haves

  • Experience with running production services on bare-metal
  • Experience with Typescript and Python
  • Excellent understanding of web applications and architecture

Benefits

  • Competitive compensation including base salary and equity
  • Comprehensive medical, dental, and vision coverage
  • 401k
  • Unlimited flexible time off
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service