Confluent - Concord, NH

posted 2 months ago

Full-time - Mid Level
Remote - Concord, NH
Publishing Industries

About the position

At Confluent, we are on a mission to revolutionize the way organizations utilize data through our innovative data streaming technology. As a Federal Site Reliability Engineer, you will play a crucial role in enabling public sector agencies to leverage real-time data for impactful decision-making. This position offers the unique opportunity to work closely with key federal agencies, ensuring that they can act on data instantly and solve pressing challenges through the Confluent Cloud platform. In this role, you will partner with our Cloud Architecture and Engineering teams to enhance the operational resiliency of Confluent Cloud systems. Your collaboration will extend across various teams to verify and deploy production changes, ensuring that our systems meet the stringent requirements of federal regulations, including FedRAMP data handling standards. You will also be responsible for maintaining critical monitoring systems that facilitate triage and escalations in the federal space, while continuously improving automated recovery processes. As a Federal Site Reliability Engineer, you will adhere to established incident and change management processes, driving continuous improvements in our operational practices. Your expertise will be vital in ensuring that our systems are not only reliable but also capable of supporting the dynamic needs of federal agencies in real-time. This position is designed for individuals who are passionate about data and its potential to transform organizations. If you are excited about the prospect of working in a remote-first environment and contributing to the success of public sector initiatives, we encourage you to apply.

Responsibilities

  • Partner with Cloud Architecture and Engineering teams to enhance operational resiliency of Confluent Cloud systems for federal agencies.
  • Collaborate across teams to verify and deploy production changes to Confluent Cloud systems and infrastructure.
  • Engage with peer engineering teams during incidents using an 'escort model' to ensure compliance with FedRAMP data handling requirements.
  • Maintain critical monitoring systems for triage and escalations in the federal space and improve automated recovery processes.
  • Adhere to established incident and change management processes and drive continuous improvements.

Requirements

  • U.S. Citizenship is required to comply with U.S. federal government regulations.
  • 6+ years of relevant experience in site reliability engineering or a related field.
  • Expertise in Cloud Native technologies with experience operating production services in the cloud.
  • Strong fundamentals of Distributed Systems and their design.
  • Deep knowledge of Kubernetes and containerization.
  • Experience with telemetry tooling to monitor production systems.
  • Confidence in problem-solving and troubleshooting critical services.
  • Proficiency with scripting and automation (e.g., Go, Java, Python, Bash).
  • Working knowledge of infrastructure as code (e.g., Terraform, CloudFormation, AWS CDK, Pulumi).
  • Exceptional teamwork and collaboration skills, with the ability to work independently in a remote-first environment.
  • Experience with a rotating on-call schedule to provide 24/7 support.
  • BS Degree in Computer Science, Engineering, or equivalent experience.

Benefits

  • Competitive salary ranging from 145,920 - 171,440 USD annually.
  • Annual bonus and competitive equity package.
  • Wide range of employee benefits including health insurance, retirement plans, and more.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service