Cockroach Labs - New York City, NY

posted 3 days ago

Full-time - Mid Level
New York City, NY
Administrative and Support Services

About the position

CockroachDB provides the backbone of storing data on a global scale. As a Site Reliability Engineer you'll help manage and scale our CockroachCloud service, a fully managed offering of CockroachDB. You will oversee our production system, ensuring that we can provide stable and scalable infrastructure as we deliver CockroachDB to our customers. CockroachCloud is a global service spanning multiple cloud providers. Roughly half of your time will be spent on greenfield development work, with an emphasis on developing tooling and driving automation. In the role you will work across multiple teams within CockroachCloud as well as development and product teams working on CockroachDB.

Responsibilities

  • Manage the infrastructure for cloud services, including running internal production systems and hosting CockroachDB for our external customers.
  • Design, write and deliver software and systems to increase product reliability and operational efficiency.
  • Develop custom tools as necessary.
  • Keep a complex system running and solve problems relating to mission-critical services.
  • Design, implement, operate, and troubleshoot the automation and monitoring of production clusters to maximize performance and availability.
  • Drive the company through disaster recovery tests, where we manually turn down pieces of CockroachDB to test its overall resilience to failures.
  • Participate in an on-call rotation for our production systems and hosted services.

Requirements

  • Expertise in analyzing, monitoring, and troubleshooting large-scale distributed systems.
  • Experience in software development using one or more of the following: Go, C, C++, Python, Java.
  • Proficiency working with algorithms, data structures, and production troubleshooting.
  • Expertise in working with major cloud providers (AWS, Azure, GCP, etc.) and Cloud APIs.
  • Debugged and optimized code and to automate routine tasks.
  • Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc.)
  • Previous on-call experience, with a sense of urgency.
  • Experience building collaborative relationships with your colleagues.

Benefits

  • Competitive Health Insurance Coverage (for you & your dependents!)
  • Paid parental leave (with baby bucks)
  • Flex Fridays
  • Flexible time off & flexible hours
  • Education reimbursement
  • Relocation support or home office allowance
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service