C3 AI - Redwood City, CA

posted 3 days ago

Full-time - Mid Level
Redwood City, CA
Publishing Industries

About the position

C3.ai, Inc. (NYSE:AI) is a leading Enterprise AI software provider for accelerating digital transformation. The proven C3 AI Platform provides comprehensive services to build enterprise-scale AI applications more efficiently and cost-effectively than alternative approaches. The C3 AI Platform supports the value chain in any industry with prebuilt, configurable, high-value AI applications for reliability, fraud detection, sensor network health, supply network optimization, energy management, anti-money laundering, and customer engagement. We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team to manage, monitor, and optimize our C3 clusters on Kubernetes. The ideal candidate will have a deep understanding of Kubernetes, Cloud Infrastructure, and Infrastructure as Code (IaC) practices. You will be responsible for ensuring the reliability, scalability of our Kubernetes clusters and Cloud Infrastructure.

Responsibilities

  • Monitor and Manage Kubernetes Clusters: Ensure the stability, health, and scalability of Kubernetes Clusters, deploying applications and services on Kubernetes.
  • Kubernetes Management: Deploy, monitor, and scale applications on Kubernetes clusters. Maintain Helm charts, manage services, and ensure resource allocation for optimal cluster performance.
  • Cloud Infrastructure Management: Work with leading Cloud Platforms (AWS, GCP, Azure) to set up, configure, and manage infrastructure resources using Infrastructure as Code (Terraform, CloudFormation, etc.).
  • Monitoring & Incident Response: Set up monitoring solutions, define alerts, and manage the incident response process for any issues related to Jenkins, C3, or Kubernetes clusters.
  • Automate Infrastructure Processes: Build automation tools for scaling, monitoring, and maintaining infrastructure using modern tools like Terraform, Ansible, or equivalent.
  • Collaborate Across Teams: Work closely with development, services, and operations teams to ensure a seamless integration between application development and infrastructure.
  • Security & Compliance: Ensure all systems follow best practices in terms of security and compliance with relevant regulations. This includes role-based access, encryption, and automated vulnerability scanning.

Requirements

  • 3+ years of experience as an SRE, DevOps Engineer, or related role.
  • Hands-on experience with Kubernetes in production environments (managing clusters, deployments, services, and pods).
  • Proficiency in cloud platforms like AWS, GCP, or Azure, including managing infrastructure via IaC tools like Terraform, CloudFormation, or equivalent.
  • Familiarity with monitoring tools like Prometheus, Grafana or equivalent.
  • Experience with Helm and managing Kubernetes applications via Helm charts.
  • Strong scripting and automation skills in languages like Bash, Python, or Groovy.
  • Experience with CI/CD tools, GitOps, and best practices for continuous integration and delivery pipelines.
  • Understanding of networking concepts and security best practices in a cloud-native environment.
  • Incident management experience, including setting up on-call rotations, managing runbooks, and post-incident reviews.

Benefits

  • Excellent benefits
  • Competitive compensation package
  • Generous equity plan
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service