This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Lucid Motorsposted 3 months ago
Full-time • Senior
Newark, CA
Transportation Equipment Manufacturing
Resume Match Score

About the position

At Lucid, we set out to introduce the most captivating, luxury electric vehicles that elevate the human experience and transcend the perceived limitations of space, performance, and intelligence. Vehicles that are intuitive, liberating, and designed for the future of mobility. We plan to lead in this new era of luxury electric by returning to the fundamentals of great design - where every decision we make is in service of the individual and environment. Because when you are no longer bound by convention, you are free to define your own experience. Come work alongside some of the most accomplished minds in the industry. Beyond providing competitive salaries, we're providing a community for innovators who want to make an immediate and significant impact. If you are driven to create a better, more sustainable future, then this is the right place for you. We're looking for a Technical Specialist Site Reliability Engineer to join our dynamic, fast-paced team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of Lucid Motors' cloud-based applications deployed across a range of public and private cloud infrastructures. As part of this high-impact team, you will collaborate with engineers, software developers, and product teams to design, build, automate, and maintain cutting-edge cloud infrastructure that powers our applications. The ideal candidate is a hands-on engineer with a can-do attitude, a strong passion for reliability engineering, and a commitment to continuous improvement.

Responsibilities

  • Own and enhance the reliability of services deployed across various cloud regions.
  • Lead the containerization and deployment of microservices and data pipelines on Kubernetes, using Helm charts, ensuring best practices for scalability and fault tolerance.
  • Foster and advocate for a DevOps culture that emphasizes automation, self-service, and engineering excellence.
  • Implement autoscaling strategies and monitor the performance of applications and infrastructure with tools like Prometheus, Grafana, and other observability platforms.
  • Perform SRE tasks such as availability monitoring, incident response, post-mortem analysis, and preparing reliability reports for leadership and stakeholders.
  • Deploy, configure, and maintain essential cloud services and tools including Kafka, Spark, Presto, Airflow, MQTT, and other microservices platforms in a cloud-native environment.
  • Set up and manage cloud infrastructure using tools like Terraform, Cluster API, and other IaC frameworks, ensuring seamless provisioning, management, and scaling of resources.
  • Continuously enhance and automate alerting, incident detection, and recovery mechanisms for critical applications and services to minimize downtime and improve system reliability.
  • Participate in an on-call rotation to meet business SLAs, quickly troubleshoot and resolve issues, and document runbooks for consistent incident management processes.
  • Work closely with Product Owners, Engineering Managers, and cross-functional teams in Agile Scrum and Kanban workflows to deliver iterative improvements and meet evolving business needs.
  • Perform impact analysis during incidents, collaborate with teams for root cause analysis, and implement preventive measures to avoid recurrence.

Requirements

  • B.S. or M.S. degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
  • 8+ years in Site Reliability Engineering (SRE), DevOps Engineering, or related fields
  • At least 4+ years of hands-on experience deploying, managing, and optimizing containerized applications using Docker and Kubernetes in both public and private cloud environments (AWS, GCP, Azure, etc.).
  • 4+ years in Infrastructure-as-Code (IaC) using Terraform, Cluster API, or similar automation frameworks to manage cloud infrastructure.
  • Experience in scripting or programming with Python, Go, Bash/Shell, or similar languages.
  • Strong understanding of using Prometheus, Grafana, and other monitoring and observability tools.
  • Ability to effectively diagnose and resolve performance bottlenecks within AWS at the infrastructure and application layers.

Nice-to-haves

  • Experience with configuration management and automation tools such as Ansible, Chef, or Puppet (preferred but not required).

Job Keywords

Hard Skills
  • Ansible
  • B
  • Kubernetes
  • Prometheus
  • Terraform
  • 2V3yvajx7Znl cK5iZRozuv9
  • 36id0xB
  • 6jhMzpdyU v71HPz4KU
  • 6ZpoBgbwe4u1 9LIWKxfb2oGC
  • 8xtorKn57EzF mrv8gUQKA73
  • A7SHTYx
  • aKdJ8y6elcw3 wLh1bPx
  • BV60Sk PaAo38Sg9NZxCO2
  • bvTdQ9uB pYd5JP
  • dBs8XmxJ0NQn bfQZkCnp04B8
  • DKlHbRyT
  • eCQxFyY0HhRs6 YIMX1vQW0ew
  • HAb3RqB
  • ixpKobCvR XP1fsR7Y
  • jZxJhU Oil82vFQYmT
  • Mj
  • Mpyu9c nYiKNJ0qzmG
  • NIeuH
  • nRUfW70xX1gwct 5EpsU8yJ4jN
  • NZEyP
  • RhioU3 yfYr3pCxSM
  • rZMFunt5zJs8 ynlrt5FuO7L0
  • SFfj6CpBm DPK8neArS7Cj
  • t4p
  • tf0X6 lQwmnS5cjH
  • ThAmq aP2xcI vLaS5BIqk
  • Uw13t8ZLNdPRsCT 9Ob 0pv7d
  • vwJVLdbF sdE4PVy9Oap
  • vZc32eO6EUsA 5qu7cOR3YJBU
  • w4g13nZFcp90xe No7DHCadI4W
  • WzaC19P 4IUQdTlE1ZKD
  • x5l2DvI bCgD9i0d4WkS sNjz x6JBACKwcLs
  • XNMQsP MK4tIwXHjvk
  • XNQnPr hXREPS1uYmgxNyM
  • XT4PSKZ6Q zaZLIFosX
  • Zbs5QS 7Gfe5Xz9H
  • zYewtjv2
Build your resume with AI

A Smarter and Faster Way to Build Your Resume

Go to AI Resume Builder
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service