Ford - Indianapolis, IN
posted 3 months ago
At Ford, we are committed to building a better world through innovative technology and mobility solutions. The Enterprise Technology team plays a crucial role in shaping the future of transportation, and we are looking for an experienced DevOps and Site Reliability Engineer (SRE) to join our Monitoring as a Service (MaaS) Team. This position is designed for individuals who are eager to leverage advanced technology to redefine the transportation landscape and enhance customer experiences. As a member of our team, you will be responsible for developing, enhancing, and extending our global monitoring and observability platform, ensuring that our software systems are available, scalable, and maintainable. In this role, you will combine software engineering and systems engineering disciplines to meet the evolving needs of our customers. You will be involved in code and pipeline development, implementing best practices, and automating processes to reduce toil and facilitate adoption. The MaaS team is focused on providing robust monitoring tools powered by AI and user-friendly dashboards, which will enhance the transparency of application performance across various hosting environments, whether on-premises or in the cloud. As a DevOps/SRE, you will construct API libraries and automation scripts, consult with product teams to onboard new applications to monitoring tools, and work closely with first responders to improve existing application tooling. You will also be responsible for deploying applications using CloudRun and Tekton pipelines, ensuring a positive user experience for our internal customers. Your strong background in software development and systems administration will be essential as you architect and develop automation solutions to improve application resilience, recoverability, availability, and scalability. You will collaborate with development teams to design and operate scalable software systems, proactively identify stability risks, and provide technical guidance to team members. Additionally, you will participate in incident response and postmortem analysis to continuously improve our systems and processes.