Confluent - Boston, MA

posted 2 months ago

Full-time - Mid Level
Remote - Boston, MA
Publishing Industries

About the position

At Confluent, we are on a mission to revolutionize the way organizations utilize data through our innovative data streaming technology. As a Federal Site Reliability Engineer, you will play a pivotal role in enabling public sector agencies to leverage real-time data for critical decision-making. This position offers the unique opportunity to work closely with key federal agencies, ensuring that they can act on data instantly and effectively. You will be responsible for delivering highly performant and reliable systems through Confluent Cloud, which provides a comprehensive end-to-end streaming experience as a Software as a Service (SaaS) model. In this role, you will partner with our Cloud Architecture and Engineering teams to enhance the operational resiliency of Confluent Cloud systems utilized by federal agencies. Your collaboration will extend across various teams to verify and deploy production changes, ensuring that our systems meet the stringent compliance requirements of FedRAMP data handling. You will also maintain critical monitoring systems for triage and escalations in the federal space, while continuously improving automated recovery processes. Adhering to established incident and change management processes will be essential, as you help drive ongoing improvements in our systems and practices. This position is ideal for individuals who are passionate about data and have a strong background in cloud-native technologies. You will be at the forefront of enabling intelligent, real-time applications that empower teams and systems to act on data instantly, making a significant impact on the public sector's ability to solve real-time problems.

Responsibilities

  • Partner with Cloud Architecture and Engineering teams to enhance operational resiliency of Confluent Cloud systems for federal agencies.
  • Collaborate across teams to verify and deploy production changes to Confluent Cloud systems and infrastructure.
  • Engage with peer engineering teams during incidents using an 'escort model' to ensure compliance with FedRAMP data handling requirements.
  • Maintain critical monitoring systems for triage and escalations in the federal space and improve automated recovery processes.
  • Adhere to established incident and change management processes and drive continuous improvements.

Requirements

  • U.S. Citizenship is required to comply with U.S. federal government regulations.
  • 6+ years of relevant experience in site reliability engineering or a related field.
  • Expertise in Cloud Native technologies with experience operating production services in the cloud.
  • Strong fundamentals of Distributed Systems and their design.
  • Deep knowledge of Kubernetes and containerization.
  • Experience with telemetry tooling to monitor production systems.
  • Confidence in problem-solving and troubleshooting critical services.
  • Proficiency with scripting and automation (e.g., Go, Java, Python, Bash).
  • Working knowledge of infrastructure as code (e.g., Terraform, CloudFormation, AWS CDK, Pulumi).
  • Exceptional teamwork and collaboration skills, with the ability to act critically with minimal supervision in a remote-first environment.
  • Experience with a rotating on-call schedule to provide 24/7 support.
  • BS Degree in Computer Science, Engineering, or equivalent experience.

Benefits

  • Competitive pay and benefits in line with industry standards.
  • Annual estimated salary of $145,920 - $171,440 USD.
  • Annual bonus and competitive equity package.
  • Wide range of employee benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service