Confluent - Boston, MA

posted 2 months ago

Full-time - Mid Level
Boston, MA
Publishing Industries

About the position

At Confluent, we are on a mission to harness the full power of continuously flowing data, enabling organizations to innovate and thrive in the modern digital landscape. As a Federal Site Reliability Engineer, you will play a pivotal role in delivering highly performant and reliable systems that empower public sector agencies to make real-time decisions with their data. This position offers the unique opportunity to work closely with key agencies, ensuring that they can solve pressing problems through the capabilities of Confluent Cloud, our end-to-end streaming experience delivered as a Software as a Service (SaaS) model. In this role, you will partner with our Cloud Architecture and Engineering teams to enhance the operational resiliency of Confluent Cloud systems utilized by federal agencies. Your collaboration will extend across various teams to verify and deploy production changes, ensuring that our systems meet the stringent compliance requirements of FedRAMP data handling. You will also maintain critical monitoring tools for triage and escalations in the federal space, while continuously improving automated recovery processes. Adhering to established incident and change management processes will be essential, as you help drive ongoing improvements in our systems and practices. This position requires a passion for data and a commitment to delivering exceptional service in a remote-first environment. You will be expected to engage actively during incidents, utilizing an “escort model” to ensure compliance and operational excellence. Your expertise in cloud-native technologies, distributed systems, and containerization will be crucial as you contribute to the success of our federal clients and the broader mission of Confluent.

Responsibilities

  • Partner with Cloud Architecture and Engineering teams to enhance operational resiliency of Confluent Cloud systems for federal agencies.
  • Collaborate across teams to verify and deploy production changes to Confluent Cloud systems and infrastructure.
  • Engage during incidents using an 'escort model' to ensure compliance with FedRAMP data handling requirements.
  • Maintain critical monitoring tools for triage and escalations in the federal space and improve automated recovery processes.
  • Adhere to established incident and change management processes and drive continuous improvements.

Requirements

  • U.S. Citizenship is required to comply with U.S. federal government regulations.
  • 6+ years of relevant experience in site reliability engineering or a related field.
  • Expertise in Cloud Native technologies with experience operating production services in the cloud.
  • Strong fundamentals of Distributed Systems and their design.
  • Deep knowledge of Kubernetes and containerization.
  • Experience with telemetry tooling to monitor production systems.
  • Confidence in problem-solving and troubleshooting critical services.
  • Proficiency with scripting and automation (e.g., Go, Java, Python, Bash).
  • Working knowledge of infrastructure as code (e.g., Terraform, CloudFormation, AWS CDK, Pulumi).
  • Exceptional teamwork and collaboration skills, with the ability to act critically with minimal supervision in a remote-first environment.
  • Experience with a rotating on-call schedule to provide 24/7 support.
  • BS Degree in Computer Science, Engineering, or equivalent experience.

Benefits

  • Competitive pay and benefits in line with industry standards.
  • Annual estimated salary of $145,920 - $171,440 USD.
  • Annual bonus and competitive equity package.
  • Wide range of employee benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service