Ford - Lansing, MI
posted 3 months ago
Ford Motor Company is seeking an experienced DevOps and Site Reliability Engineer (SRE) to join our Monitoring as a Service (MaaS) Team. This role is pivotal in leading the development, enhancement, and extension of our global monitoring and observability platform. As a DevOps/SRE, you will combine software engineering and systems engineering disciplines to ensure that our software systems are available, scalable, and maintainable. You will play a crucial role in shaping the evolving needs of our customers, which includes code and pipeline development, establishing best practices with associated templates, and automating processes to reduce toil and facilitate adoption. In this position, you will be responsible for constructing API libraries and automation scripts based on existing project workflows, primarily using Python. You will consult with product teams to onboard new applications to various monitoring applications such as Splunk, Dynatrace, and VictorOps. Additionally, you will work closely with first responders and product teams to improve and support tooling for existing applications, which may include participating in an on-call rotation schedule for incident management. Your role will also involve integrating and consolidating application workflows efficiently, deploying applications to containers using CloudRun and Tekton pipelines, and delivering a positive web user interface/experience to our internal Ford customers. You will leverage your strong background in software development and systems administration to perform destructive testing to discover vulnerabilities, architect and develop automation to improve resilience, recoverability, availability, and scalability of supported applications. You will also be responsible for measuring and optimizing system performance, proactively identifying stability risks, and collaborating with development teams to design, build, and operate scalable and resilient software systems using cloud-native principles. Your contributions will help establish an SRE mindset within the team to ensure maximum availability and uptime, conduct performance analysis, and provide technical guidance and mentorship to other team members.