Workday - McLean, VA

posted 4 months ago

Full-time - Mid Level
McLean, VA
Publishing Industries

About the position

At Workday, we are looking for a dedicated DevOps/Site Reliability Engineer (SRE) who is passionate about automating, operating, and improving our pioneering cloud-native service platforms. This role is crucial as it supports one or more contracts with the U.S. Federal Government, which requires all personnel working on these contracts to be U.S. citizens. The primary function of our DevOps/SRE team is to ensure the reliability and availability of our platform, meeting desired Service Level Agreements (SLAs), reducing operational load, and scaling sustainably in alignment with business growth. As a key member of our team, you will be responsible for software engineering and operations, focusing on reducing operational toil and enhancing the overall customer experience. Our team operates in a scrum environment, planning automation and improvements through two-week sprints. We are autonomous, with an on-call function that follows the sun, ensuring continuous support. Our tech stack is entirely cloud-native, utilizing technologies such as Kubernetes, Istio, OPA, GoLang, Ruby/Groovy, ArgoCD, Jenkins, Prometheus, and Grafana. You will be responsible for ensuring the safe change and reliability of customer environments, implementing SLO gated multi-stage deployment automation, and improving platform reliability and observability. This includes developing effective Service Level Indicators (SLIs) to ensure that SLOs are achieved, building an extendable observability architecture, and establishing new processes in collaboration with platform service teams. We are looking for someone who is passionate about identifying and solving problems in distributed environments, particularly those that scale across configuration, Linux Operating Systems, and networks. You should have hands-on experience with distributed environments, especially Kubernetes, and a strong belief in the importance of automation for operating large-scale systems. Your drive for customer success will be key in this role, as will your ability to work independently and collaboratively with diverse global teams. Excellent documentation skills and experience in developing detailed runbooks and processes are also essential.

Responsibilities

  • Ensure the reliability and availability of the platform to meet desired SLAs.
  • Reduce operational load and scale sustainably in alignment with business growth.
  • Be a key member of the DevOps/SRE team responsible for software engineering and operations.
  • Plan automation and improvements following scrum practices with two-week sprints.
  • Implement SLO gated multi-stage deployment automation for customer environments.
  • Develop and launch effective SLIs to ensure SLOs are achieved.
  • Build an extendable Observability architecture and establish new processes.
  • Partner with platform service teams to craft and implement SRE standards for their respective services.

Requirements

  • U.S. citizenship (naturalized or native) due to federal government security requirements.
  • Active TS/SCI with CI POLY security clearance.
  • BS/MS in Computer Science or related field or equivalent degree.
  • 4+ years of DevOps or SRE experience in a distributed systems environment.
  • 4+ years of experience with AWS, GCP, or Azure.
  • 4+ years of experience with Kubernetes.
  • Proficiency with a programming language such as GoLang, Python, or Ruby (preferably GoLang).

Nice-to-haves

  • Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD) certification.
  • Associate Cloud Engineer certification.
  • Experience with software development standard methodologies such as code management, CI/CD, and testing.

Benefits

  • Workday Bonus Plan eligibility.
  • Annual refresh stock grants.
  • Flexible work schedule allowing for a mix of in-person and remote work.
  • Comprehensive benefits package including health insurance and retirement plans.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service