Senior Site Reliability Engineer, FedRAMP

Cisco - San Francisco, CA

posted 4 months ago

Full-time - Senior

San Francisco, CA

Computer and Electronic Product Manufacturing

About the position

The FedRAMP Site Reliability Engineer (SRE) role at Cisco ThousandEyes is pivotal in ensuring the reliability and performance of our Federal region's infrastructure and operations. This position is responsible for managing all aspects of the Federal region's platform, which includes availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning, with a strong emphasis on security. The SRE team operates under the principle of treating operations and infrastructure as code, which enhances the efficiency and effectiveness of our distributed team. As a Senior Site Reliability Engineer, you will be tasked with maintaining a robust and scalable infrastructure capable of handling a high volume of incoming data daily. You will collaborate closely with software engineers to design and optimize the ThousandEyes platform's infrastructure and services, ensuring they meet the highest standards for availability, latency, and performance. Your role will also involve the design, implementation, and management of FedRAMP-compliant infrastructure and systems, establishing processes for continuous monitoring, logging, and auditing to ensure compliance with FedRAMP controls. In addition, you will work alongside security teams to identify and remediate vulnerabilities, conduct security assessments, and implement necessary security controls. You will be responsible for designing and implementing dynamic infrastructure solutions that support the growth and scaling of our platform, particularly in multi-region environments. Your expertise in automation will be crucial in enabling our infrastructure and platforms to scale effortlessly, with a special focus on FedRAMP systems. Staying updated on industry best practices, evolving security threats, and changes to FedRAMP guidelines will be essential to improving the security posture of our systems. This role also includes designing, deploying, and maintaining cloud-native services in AWS that are elastic and resilient to failure, participating in incident response, and contributing to our 24x7 on-call rotation. Capacity planning for the infrastructure and platform will be a key responsibility, helping teams prepare for future growth.

Responsibilities

Join forces with software engineers to ensure the ThousandEyes platform's Federal region infrastructure and services are designed and optimized for availability, latency, and performance.
Design, implement, and manage FedRAMP-compliant infrastructure and systems.
Establish and maintain processes for continuous monitoring, logging, and auditing of systems to ensure compliance with FedRAMP controls.
Collaborate with security teams to identify and remediate vulnerabilities, conduct security assessments, and implement necessary security controls.
Design and implement dynamic infrastructure solutions to support multi-region scaling.
Drive and build automation to enable infrastructure and platforms to scale effortlessly, focusing on FedRAMP systems.
Stay updated on industry best practices, evolving security threats, and updates to FedRAMP guidelines to improve security posture.
Design, deploy, and maintain cloud-native services in AWS that are elastic and resilient to failure.
Participate in and contribute to improving our 24x7 incident response and on-call rotation.
Conduct capacity planning for the infrastructure and platform to prepare for growth.

Requirements

5+ years of experience in a relevant field.
Experience building and/or operating FedRAMP environments.
Experience identifying and analyzing cyber security risks.
Solid understanding of the FedRAMP framework, its controls, and compliance requirements.
Familiarity with security standard processes, vulnerability management, and incident response processes.
Ability to write high-quality code in Python, Go, or equivalent languages.
Ability to build and implement scalable and well-tested solutions.
Good understanding of Unix/Linux systems, the kernel, system libraries, file systems, and client-server protocols.
Knowledge of cloud providers, ideally AWS.
Infrastructure as Code skills, ideally with Terraform, Puppet, and Kubernetes.
Good communication and documentation skills.
Solid sense of ownership, drive, and enthusiastic attention to detail.

Nice-to-haves

Experience with incident response processes in a cloud environment.
Familiarity with monitoring and logging tools such as Prometheus, Grafana, or similar.
Experience with container orchestration tools like Kubernetes.
Knowledge of networking concepts and protocols.

Benefits

Medical, dental, and vision insurance coverage.
401(k) plan with Cisco matching contribution.
Short and long-term disability coverage.
Basic life insurance.
Numerous wellbeing offerings.
Up to twelve paid holidays per calendar year, including one floating holiday and a day off for the employee's birthday.
Accrual of up to 20 days of Paid Time Off (PTO) each year.
Paid time away to deal with critical or emergency issues without tapping into PTO.
Additional paid time to volunteer and give back to the community.
Employee Stock Purchase Program allowing employees to purchase company stock.

Senior Site Reliability Engineer, FedRAMP

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company