Ford - Juneau, AK

posted about 2 months ago

Full-time - Mid Level
Onsite - Juneau, AK
Transportation Equipment Manufacturing

About the position

At Ford, we are committed to building a better world through innovative technology and mobility solutions. The Enterprise Technology team plays a crucial role in shaping the future of transportation, enhancing customer experiences, and improving lives. We are looking for a talented DevOps and Site Reliability Engineer (SRE) to join our Monitoring as a Service (MaaS) Team. This team is dedicated to developing and evolving services that prioritize customer needs, providing robust monitoring tools powered by AI and user-friendly dashboards. The role involves ensuring the availability, scalability, and maintainability of our global monitoring and observability platform, which is essential for modernizing and disrupting our services. As a DevOps/SRE, you will combine software engineering and systems engineering disciplines to lead the development and enhancement of our monitoring services. Your responsibilities will include constructing API libraries and automation scripts, consulting with product teams to onboard new applications to monitoring tools, and improving tooling for existing applications. You will also be involved in deploying applications to containers, delivering a positive user experience, and architecting automation solutions to enhance application resilience and scalability. This position requires a strong background in software development, systems administration, and excellent problem-solving skills. In this role, you will proactively identify stability risks, collaborate with development teams, and provide technical guidance and mentorship. You will participate in incident response and postmortem analysis, ensuring maximum availability and uptime for our systems. This is an exciting opportunity to leverage your expertise in a dynamic environment and contribute to the future of mobility at Ford.

Responsibilities

  • Constructing API Libraries & automation scripts based on existing project workflows, mainly developing in Python
  • Consulting with Product Teams to onboard new applications to Splunk, Dynatrace, VictorOps, and other Monitoring Applications
  • Work with First Responders and Product teams to improve and support tooling for existing applications - May include partaking in an On-Call rotation schedule for incident-management
  • Integrating & consolidating application workflows efficiently
  • Deploying applications to containers using CloudRun and Tekton pipelines
  • Delivering a positive web user interface/experience to our internal Ford customers
  • Leverage experience to safely perform destructive testing to seek and discover vulnerabilities
  • Architect, design and develop automation to improve resilience, recoverability, availability, and scalability of supported applications
  • Recognize, validate, and evangelize emerging technologies and architectures that align with business objectives
  • Develop tooling to improve reliability, quality, and time-to-market for software solutions
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Identify and reduce or eliminate toil via automation to maximize the time spent on engineering and innovation
  • Collaborate with development teams to design, build, and operate scalable and resilient software systems using Cloud native principles
  • Proactively identify stability risks and work with engineering leadership to establish appropriate mitigation plans
  • Regularly review key technical metrics such as transactions errors, logging, response times, caching strategies, conversion/bounce rates, capacity, and resource utilization
  • Assist in establishing SRE mindset to ensure maximum availability/uptime
  • Conduct performance analysis and optimization of new and in-production systems
  • Provide technical guidance and mentorship to other team members
  • Participate in incident response, support, recovery, and postmortem analysis

Requirements

  • Bachelor's degree in Computer Science, Computer Engineering, Systems Engineering or related field or a combination of education and equivalent work experience
  • 3+ years of experience as a DevOps Engineer and/or Site Reliability Engineer
  • 5+ years of experience programming with one or more: Python, Go, Java/Scala, C or C++ or similar technologies
  • 3+ years of experience with any APM and other monitoring tools such as Dynatrace, New Relic, ELK, Splunk, Prometheus, Sensu, Nagios, Kafka, DataDog
  • 1+ years with Google Cloud and its library of services

Nice-to-haves

  • Master's Degree in Computer Science, Computer Engineering, Systems Engineering or related field
  • Strong experience with J2EE, NoSQL/SQL Datastore, Spring Boot, GCP/AWS/Azure & Docker/K8 in developing multi-tier applications
  • Experience with automated test-driven development in CI/CD Pipelines
  • Thorough understanding of software development and agile programming
  • Understanding and ability to implement effective observability strategies to improve MTTD/R
  • Experience with RESTful APIs and microservices platforms
  • Working knowledge of the TCP/IP stack, internet routing and load balancing

Benefits

  • Immediate medical, dental, and prescription drug coverage
  • Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up childcare and more
  • Vehicle discount program for employees and family members, and management leases
  • Tuition assistance
  • Established and active employee resource groups
  • Paid time off for individual and team community service
  • A generous schedule of paid holidays, including the week between Christmas and New Year's Day
  • Paid time off and the option to purchase additional vacation time
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service