Tesla - Fremont, CA
posted 2 months ago
Tesla's Platform Engineering is seeking a Site Reliability Engineer to join our dynamic team. In this role, you will be responsible for building and maintaining Kubernetes clusters using infrastructure-as-code tools such as Ansible, Terraform, ArgoCD, and Helm. Your primary focus will be to support application teams in successfully deploying their applications on our platform. The infrastructure you will work with includes a combination of on-premise virtual machines, bare metal hosts, and public cloud services like AWS, which presents unique challenges and opportunities to engage with various infrastructure technologies. As a Site Reliability Engineer, you will be expected to have expert knowledge in Linux fundamentals, architecture, and performance tuning, along with strong software development skills. Experience in running Kubernetes in a production environment is highly desirable, and proficiency in programming languages such as Golang or Python will be essential for automating tasks and building necessary tools. You will be part of a team that manages production-critical workloads across all aspects of Tesla's business, setting standards for other engineering teams and solving some of the most challenging problems in the industry. Your role will involve hands-on collaboration with developers to deploy applications, building new features to enhance platform stability and updates, and managing Kubernetes clusters both on-premise and in the cloud to accommodate our growing workloads. You will participate in the architecture design process and troubleshoot live applications in collaboration with product teams. Additionally, you will be part of a 24x7 on-call rotation, influencing architectural decisions with a focus on security, scalability, and high performance. You will also be responsible for setting up and maintaining monitoring, metrics, and reporting systems to ensure fine-grained observability and actionable alerting, as well as authoring technical documentation for workflows, processes, and best practices.