Tesla - Fremont, CA

posted 2 months ago

Full-time - Mid Level
Fremont, CA
Transportation Equipment Manufacturing

About the position

Tesla's Platform Engineering is seeking a Site Reliability Engineer to join our dynamic team. In this role, you will be responsible for building and maintaining Kubernetes clusters using infrastructure-as-code tools such as Ansible, Terraform, ArgoCD, and Helm. Your primary focus will be to support application teams in successfully deploying their applications on our platform. The infrastructure you will work with includes a combination of on-premise virtual machines, bare metal hosts, and public cloud services like AWS, which presents unique challenges and opportunities to engage with various infrastructure technologies. As a Site Reliability Engineer, you will be expected to have expert knowledge in Linux fundamentals, architecture, and performance tuning, along with strong software development skills. Experience in running Kubernetes in a production environment is highly desirable, and proficiency in programming languages such as Golang or Python will be essential for automating tasks and building necessary tools. You will be part of a team that manages production-critical workloads across all aspects of Tesla's business, setting standards for other engineering teams and solving some of the most challenging problems in the industry. Your role will involve hands-on collaboration with developers to deploy applications, building new features to enhance platform stability and updates, and managing Kubernetes clusters both on-premise and in the cloud to accommodate our growing workloads. You will participate in the architecture design process and troubleshoot live applications in collaboration with product teams. Additionally, you will be part of a 24x7 on-call rotation, influencing architectural decisions with a focus on security, scalability, and high performance. You will also be responsible for setting up and maintaining monitoring, metrics, and reporting systems to ensure fine-grained observability and actionable alerting, as well as authoring technical documentation for workflows, processes, and best practices.

Responsibilities

  • Deploy applications in collaboration with developers to provide support.
  • Build new features to improve platform stability and updates.
  • Manage Kubernetes clusters on-premise and in the cloud to support growing workloads.
  • Participate in the architecture design process and troubleshoot live applications with product teams.
  • Participate in a 24x7 on-call rotation, including a weekday day shift and weekend shifts every 6-8 weeks.
  • Influence architectural decisions focusing on security, scalability, and high performance.
  • Set up and maintain monitoring, metrics, and reporting systems for observability and alerting.
  • Author technical documentation for workflows, processes, and best practices.

Requirements

  • Experience managing web-scale infrastructure in a production *nix environment.
  • Ability to prioritize tasks and work independently with an analytical mindset and a bias for action.
  • Advanced or expert-level Linux administration and performance tuning skills.
  • Bachelor's Degree in Computer Science, Computer Engineering, or equivalent experience or evidence of exceptional ability.
  • Advanced experience with configuration management systems such as Ansible, Terraform, or Puppet.
  • Demonstrable knowledge of Linux operating system internals, networking stack, filesystems, resource scheduling, and process management.
  • Exposure to AWS or other cloud infrastructure providers.
  • Experience managing container-based workloads using Kubernetes or other orchestration software in production (ArgoCD, Helm).
  • Proficiency in a high-level programming language like Python, Go, Ruby, or Java.

Nice-to-haves

  • Experience with additional orchestration tools beyond Kubernetes.
  • Familiarity with CI/CD pipelines and DevOps practices.
  • Knowledge of security best practices in cloud environments.

Benefits

  • Aetna PPO and HSA plans with $0 payroll deduction for medical options.
  • Family-building, fertility, adoption, and surrogacy benefits.
  • Dental and vision plans with options for $0 paycheck contribution.
  • Company paid HSA contribution when enrolled in the High Deductible Aetna medical plan with HSA.
  • Healthcare and Dependent Care Flexible Spending Accounts (FSA).
  • LGBTQ+ care concierge services.
  • 401(k) with employer match and Employee Stock Purchase Plans.
  • Company paid Basic Life, AD&D, short-term and long-term disability insurance.
  • Employee Assistance Program.
  • Sick and vacation time (flex time for salary positions) and paid holidays.
  • Back-up childcare and parenting support resources.
  • Voluntary benefits including critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance.
  • Weight Loss and Tobacco Cessation Programs.
  • Tesla Babies program.
  • Commuter benefits.
  • Employee discounts and perks program.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service