DevOps/ Site Reliability Engineer

Ford - Indianapolis, IN

posted 3 months ago

Full-time - Mid Level

Onsite - Indianapolis, IN

Transportation Equipment Manufacturing

About the position

At Ford, we are committed to building a better world through innovative technology and mobility solutions. The Enterprise Technology team plays a crucial role in shaping the future of transportation, and we are looking for an experienced DevOps and Site Reliability Engineer (SRE) to join our Monitoring as a Service (MaaS) Team. This position is designed for individuals who are eager to leverage advanced technology to redefine the transportation landscape and enhance customer experiences. As a member of our team, you will be responsible for developing, enhancing, and extending our global monitoring and observability platform, ensuring that our software systems are available, scalable, and maintainable. In this role, you will combine software engineering and systems engineering disciplines to meet the evolving needs of our customers. You will be involved in code and pipeline development, implementing best practices, and automating processes to reduce toil and facilitate adoption. The MaaS team is focused on providing robust monitoring tools powered by AI and user-friendly dashboards, which will enhance the transparency of application performance across various hosting environments, whether on-premises or in the cloud. As a DevOps/SRE, you will construct API libraries and automation scripts, consult with product teams to onboard new applications to monitoring tools, and work closely with first responders to improve existing application tooling. You will also be responsible for deploying applications using CloudRun and Tekton pipelines, ensuring a positive user experience for our internal customers. Your strong background in software development and systems administration will be essential as you architect and develop automation solutions to improve application resilience, recoverability, availability, and scalability. You will collaborate with development teams to design and operate scalable software systems, proactively identify stability risks, and provide technical guidance to team members. Additionally, you will participate in incident response and postmortem analysis to continuously improve our systems and processes.

Responsibilities

Construct API libraries and automation scripts based on existing project workflows, mainly developing in Python.
Consult with Product Teams to onboard new applications to Splunk, Dynatrace, VictorOps, and other Monitoring Applications.
Work with First Responders and Product teams to improve and support tooling for existing applications, including participating in an On-Call rotation schedule for incident management.
Integrate and consolidate application workflows efficiently.
Deploy applications to containers using CloudRun and Tekton pipelines.
Deliver a positive web user interface/experience to internal Ford customers.
Perform destructive testing to seek and discover vulnerabilities.
Architect, design, and develop automation to improve resilience, recoverability, availability, and scalability of supported applications.
Recognize, validate, and evangelize emerging technologies and architectures that align with business objectives.
Develop tooling to improve reliability, quality, and time-to-market for software solutions.
Measure and optimize system performance, pushing capabilities forward and innovating to improve.
Identify and reduce or eliminate toil via automation to maximize engineering and innovation time.
Collaborate with development teams to design, build, and operate scalable and resilient software systems using Cloud native principles.
Proactively identify stability risks and work with engineering leadership to establish appropriate mitigation plans.
Regularly review key technical metrics such as transaction errors, logging, response times, caching strategies, conversion/bounce rates, capacity, and resource utilization.
Assist in establishing an SRE mindset to ensure maximum availability/uptime.
Conduct performance analysis and optimization of new and in-production systems.
Provide technical guidance and mentorship to other team members.
Participate in incident response, support, recovery, and postmortem analysis.

Requirements

Bachelor's degree in Computer Science, Computer Engineering, Systems Engineering or related field, or a combination of education and equivalent work experience.
3+ years of experience as a DevOps Engineer and/or Site Reliability Engineer.
5+ years of experience programming with one or more: Python, Go, Java/Scala, C or C++ or similar technologies.
3+ years of experience with any APM and other monitoring tools such as Dynatrace, New Relic, ELK, Splunk, Prometheus, Sensu, Nagios, Kafka, DataDog.
1+ years with Google Cloud and its library of services.

Nice-to-haves

Master's Degree in Computer Science, Computer Engineering, Systems Engineering or related field.
Strong experience with J2EE, NoSQL/SQL Datastore, Spring Boot, GCP/AWS/Azure & Docker/K8 in developing multi-tier applications.
Experience with automated test-driven development in CI/CD Pipelines.
Thorough understanding of software development and agile programming.
Understanding and ability to implement effective observability strategies to improve MTTD/R.
Experience with RESTful APIs and microservices platforms.
Working knowledge of the TCP/IP stack, internet routing and load balancing.

Benefits

Immediate medical, dental, and prescription drug coverage.
Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up childcare and more.
Vehicle discount program for employees and family members, and management leases.
Tuition assistance.
Established and active employee resource groups.
Paid time off for individual and team community service.
A generous schedule of paid holidays, including the week between Christmas and New Year's Day.
Paid time off and the option to purchase additional vacation time.

DevOps/ Site Reliability Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company