Senior Site Reliability Engineer I/II

$135,000 - $195,000/Yr

Umbra Lab - Santa Barbara, CA

posted 3 months ago

Full-time - Mid Level

Remote - Santa Barbara, CA

Computer and Electronic Product Manufacturing

About the position

Umbra is seeking a Senior Site Reliability Engineer I/II to join our team in Santa Barbara, California, or to work fully remote. As a technology company focused on building next-generation space systems, we are on a mission to deliver global omniscience through our innovative satellite data solutions. This role is critical in helping us design, build, operate, and scale our mission and business-critical infrastructure. The ideal candidate will possess a deep understanding of the entire technology stack and architecture, enabling informed decisions about technical debt and trade-offs. They will demonstrate leadership in technical innovation and excellence, researching and advocating for new technologies and best practices while challenging and refining existing processes to enhance efficiency and effectiveness across projects and services. In this position, you will ensure the reliability and scalability of critical systems, meeting service level agreements (SLAs) through proactive monitoring and effective incident response. You will develop and promote new technologies and tools, conducting research and creating proofs of concept to introduce solutions that enhance the team's capabilities. Collaboration is key, as you will work closely with cross-functional teams, product managers, and stakeholders to align on technical strategy and provide expert guidance. Additionally, you will participate in on-call rotations, providing support and resolving complex technical issues as they arise. This role requires a proactive approach to continuous improvement, where you will evaluate and improve team processes and workflows to increase efficiency and reduce complexity. Your ability to communicate effectively with both technical and non-technical stakeholders will foster collaboration and understanding across diverse teams, driving impactful changes that benefit the entire organization.

Responsibilities

Ensure the reliability and scalability of critical systems, meeting SLAs through proactive monitoring and effective incident response.
Develop and promote new technologies and tools, conducting research and creating proofs of concept to introduce solutions that enhance the team's capabilities.
Lead by example in fostering a culture of excellence and reliability.
Continuously evaluate and improve team processes and workflows to increase efficiency and reduce complexity.
Collaborate closely with cross-functional teams, product managers, and stakeholders to align on technical strategy and provide expert guidance.
Participate in on-call rotations, providing support and resolving complex technical issues.
Perform all other duties as assigned.

Requirements

6+ years in a Site Reliability Engineer or DevOps role supporting a SaaS platform, with demonstrated expertise managing distributed systems.
Extensive experience with AWS services (EC2, S3, Lambda, VPC Networking) and deep knowledge of cloud infrastructure, networking, and security best practices.
Proficiency running, optimizing, and scaling Kubernetes clusters in production environments.
Experience using and writing Terraform to architect and manage production infrastructure.
Ability to create and utilize Infrastructure-as-code (IaC), GitOps practices, and automation tools to increase reliability and reduce manual tasks.
Proven success in leading teams or projects using Agile/Scrum methodologies.
Expertise in infrastructure and software architecture, capable of designing and implementing large-scale, reliable systems with minimal guidance.
Experience developing and managing comprehensive infrastructure monitoring and alerting strategies.

Nice-to-haves

Advanced understanding of cloud and application security, identity management, and compliance.
Expertise in service mesh and service registration technologies, focusing on performance and reliability.
Bachelor's degree in Computer Science or a related field, or equivalent professional experience.

Benefits

Flexible Time Off, Sick, Family & Medical Leave
Medical, Dental, Vision, Life, LTD, STD (employer funded)
Voluntary Life, Critical Illness, Accidental, Hospital Indemnity, Pet Insurance (employee funded)
401k with 3% non-elective company contribution
Stock Options

Senior Site Reliability Engineer I/II

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company