Suki - Redwood City, CA

posted 2 months ago

Full-time - Mid Level
Remote - Redwood City, CA
Food Services and Drinking Places

About the position

The Machine Learning Operations Engineer III at Suki will play a crucial role in building and maintaining enterprise-grade cloud infrastructure primarily using Google Cloud Platform (GCP). This position focuses on creating cloud standards, automating tasks, and ensuring high availability and reliability of business needs. The engineer will collaborate closely with developers to design scalable microservices deployments and will be part of an on-call rotation to support production systems.

Responsibilities

  • Build enterprise-grade Cloud infrastructure for business needs, primarily with GCP products.
  • Create Cloud standards and best practices for high availability & reliability business needs.
  • Use Terraform and Helm to automate GCP tasks and Kubernetes deployments.
  • Direct, handle and resolve support tickets related to business requirements in GCP.
  • Work closely with developers to design highly available and scalable deployments of microservices.
  • Provision GCP networking, VPCs and VPN connections.
  • Build fully automated CI/CD processes using Gitflow, Google Cloud Build & ArgoCD.
  • Be part of an on-call rotation to support production systems.
  • Be a creative and dynamic cross-functional technical resource with the endurance for a fast-paced start-up environment.

Requirements

  • GCP Professional and Specialty Certifications with 3+ years of experience.
  • Proficient in a programming or scripting language like GoLang or Python.
  • 3+ years of Kubernetes experience managing cloud-native, container-based applications with microservice architecture.
  • Understanding of infrastructure as code, ideally solid experience with Terraform.
  • DB administration experience including both relational schema design and non-relation architectures along with SQL optimization.
  • Proficient in CI/CD tooling and automation capabilities using CloudBuild, Jenkins or similar automation platforms.
  • Excellent debugging, problem solving and analytical skills.
  • Experience with developing and maintaining Helm Charts.
  • Solid Linux administration skills.

Nice-to-haves

  • Experience in software development, preferably in DevOps/SRE.
  • Extensive cloud deployment/systems experience, preferably with GCP.
  • Strong understanding of networking, DNS, HTTP and Restful services.
  • Organized, self-driven, capable and willing to resolve issues from L1-L3 as needed.

Benefits

  • Hybrid work model with three days in the office and two days remote.
  • Opportunity to work in a fast-paced, innovative environment.
  • Impactful work that directly supports healthcare professionals.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service