Ford - Oklahoma City, OK

posted 23 days ago

Full-time - Mid Level
Onsite - Oklahoma City, OK
Transportation Equipment Manufacturing

About the position

The Site Reliability Engineer (SRE) at Ford Motor Company is responsible for maintaining and enhancing the reliability, scalability, and performance of services within the Google Cloud Platform (GCP). This role involves collaboration with development teams to build and manage large-scale distributed systems, ensuring high availability and optimal user experience. The SRE will also implement monitoring solutions, automate operational tasks, and participate in on-call rotations to address system issues effectively.

Responsibilities

  • Write, configure, and deploy code that improves service reliability for existing or new systems; set standard for others with respect to code quality
  • Provide helpful and actionable feedback and review for code or production changes
  • Drive repair/optimization of complex systems with consideration towards a wide range of contributing factors
  • Lead debugging, troubleshooting, and analysis of service architecture and design
  • Participate in on-call rotation
  • Write documentation: design, system analysis, runbooks, playbooks. Provide design feedback and uplevel design skills of others
  • Implement and manage monitoring solutions using Dynatrace, Splunk, and OpenTelemetry to ensure visibility and proactive issue detection across our platforms
  • Work within GCP infrastructure, optimizing performance, and cost, and scaling resources to meet demand
  • Collaborate with development teams to enhance system reliability and performance, applying a platform engineering mindset to system administration tasks
  • Develop and maintain automated solutions for operational aspects such as on-call monitoring, performance tuning, and disaster recovery
  • Troubleshoot and resolve issues in our dev, test, and production environments
  • Participate in postmortem analysis and create preventative measures for future incidents

Requirements

  • Bachelor's degree in Computer Science, Engineering, Mathematics or equivalent experience
  • 3+ years of experience as an SRE, DevOps Engineer, Software Engineer or similar role
  • Strong experience with monitoring and observability tools, particularly Dynatrace and OpenTelemetry or other tools
  • Proficient with cloud services, with a strong preference for Google Cloud Platform (GCP) experience
  • Solid programming skills in Java, Golang, or other programming language, with a good understanding of software development best practices
  • Experience with relational and document databases
  • Familiarity with front-end development frameworks, particularly React
  • Ability to debug, optimize code, and automate routine tasks
  • Strong problem-solving skills and the ability to work under pressure in a fast-paced environment
  • Excellent verbal and written communication skills

Benefits

  • Immediate medical, dental, and prescription drug coverage
  • Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more
  • Vehicle discount program for employees and family members, and management leases
  • Tuition assistance
  • Established and active employee resource groups
  • Paid time off for individual and team community service
  • A generous schedule of paid holidays, including the week between Christmas and New Year's Day
  • Paid time off and the option to purchase additional vacation time
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service