This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Staff Site Reliability Engineer

$135,520 - $178,060/Yr

Crisis Text Line - New York, NY

posted about 2 months ago

Full-time - Senior
New York, NY

About the position

As a Staff Site Reliability Engineer (SRE) at Crisis Text Line, you will play a crucial role in ensuring the reliability, scalability, and security of our platform. Reporting to the Senior Engineering Manager of SRE/Infrastructure, you will architect, build, and maintain the tooling that empowers our software engineering teams while managing the infrastructure that supports our mission of providing mental health support. This position emphasizes enhancing engineer productivity through automation and collaboration with developers to optimize performance and security.

Responsibilities

  • Lead and mentor a team of 5 SREs, fostering a collaborative and innovative work environment.
  • Work closely with TechOps/Security staff to enforce security best practices across infrastructure and development processes.
  • Design, implement, and maintain highly available and scalable AWS infrastructure.
  • Collaborate with developers to optimize application performance and reliability.
  • Develop and maintain monitoring, logging, and alerting systems to ensure system health and performance.
  • Automate repetitive tasks and processes to improve efficiency and reduce manual intervention.
  • Respond to and resolve incidents, minimizing downtime and ensuring quick recovery.
  • Support and encourage a diversity of backgrounds, voices, and perspectives on the engineering team.
  • Communicate expectations, progress, and issues to engineers and product managers with clarity and kindness.
  • Spread knowledge, provide mentorship, and promote technical best practices.
  • Write and review high-quality, easy-to-read, and testable code that follows best practices.
  • Manage time successfully by focusing on priorities and delivering on deadlines.
  • Provide engineering input and estimate work during refinement and architecture design.
  • Participate in retrospectives and post-mortems to improve processes and operations.
  • Conduct regular security audits and vulnerability assessments, addressing identified issues.
  • Stay up-to-date with industry trends and emerging technologies.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field (Master's degree preferred).
  • Proven experience as a Staff SRE or in a similar SRE role, with a strong focus on infrastructure and DevOps.
  • Experience maintaining the reliability of online SaaS/PaaS with a 24/7 schedule.
  • Proficiency in AWS and infrastructure as code (e.g., Terraform, CloudFormation).
  • Strong scripting and automation skills (e.g., Python) and in-depth knowledge of containerization and orchestration (e.g., Docker, Kubernetes).
  • Proven experience in implementing CI/CD pipelines and tools (GitHub Actions) and observability tools (Datadog).
  • Commitment to ethical practices, data privacy, and security.
  • Solid understanding of network protocols, security principles, and best practices.
  • Excellent problem-solving skills and the ability to work under pressure.

Nice-to-haves

  • Master's degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • Experience implementing Failure Injection / Chaos Engineering practices.
  • Cloud Solution Architect certifications or completed training (e.g., AWS Cloud Practitioner Essentials and/or AWS Certified Solutions Architect - Associate).
  • Strong experience with AWS Solution Architecture across Next.js, Go, PHP APIs, GraphQL, Databricks, and AI/ML workloads.
  • Knowledge of compliance and regulatory standards (e.g., GDPR, HIPAA, ISO 27001, SOC2).
  • Experience in a non-profit or mission-driven organization.

Benefits

  • 20 paid holidays including federal holidays, election day, and a holiday break from Dec 24 through January 1.
  • Flexible paid time off, including 15 vacation days, 3 personal days, and 7 sick days.
  • Medical, dental, and vision benefits for the staff member and family at no cost to the employee.
  • 403B retirement plan with a 3% contribution by Crisis Text Line.
  • 12 weeks paid parental leave after 6 months of employment.
  • Student loan repayment after 2 years of continuous full-time service.
  • Family support through a virtual childcare platform.
  • Monthly stipends for mental health and internet service, and annual stipends for professional development and wellness.
  • One-time home office setup allowance in the first year.
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service