Cognizant Technology Solutions - Phoenix, AZ

posted 16 days ago

Full-time - Mid Level
Phoenix, AZ
10,001+ employees
Professional, Scientific, and Technical Services

About the position

The Observability Site Reliability Engineer (SRE) is responsible for ensuring the reliability and scalability of services within the organization. This role focuses on improving Mean Time To Detect (MTTD) and Mean Time To Recover (MTTR) through the implementation of fullstack observability and automation of nonfunctional engineering via robust CI/CD pipelines.

Responsibilities

  • Develop and maintain SMART monitoring solutions to enable quicker problem detection and isolation.
  • Strategize and implement deployment models like Canary or BlueGreen to minimize downtime during deployments.
  • Utilize increased automation, reusable assets, and selfhealing techniques to improve system reliability.
  • Build resiliency across application and infrastructure layers through Chaos Engineering.
  • Embed performance and scalability into application design and code from the initial stages.

Requirements

  • Proven experience in SRE or similar roles with a focus on observability.
  • Strong understanding of CI/CD pipelines and automation tools.
  • Experience with deployment models such as Canary or BlueGreen.
  • Knowledge of Chaos Engineering and its application in building resilient systems.
  • Ability to work collaboratively in a fastpaced environment.
  • Bachelor's degree in Computer Science, Engineering, or related field.
  • Minimum of 3 years in a Site Reliability Engineering role or similar.
  • Proficiency in monitoring tools and technologies.
  • Strong analytical and problemsolving skills.
  • Excellent communication and teamwork abilities.

Nice-to-haves

  • Cloud technologies: Support resources operating in GCP, Azure
  • Prior experience using a Commercial Observability/APM solution (Dynatrace, New Relic, Datadog, AppDynamics, Honeycomb)
  • Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana
  • Prior SRE role
  • Experience supporting and troubleshooting issues with critical business apps.
  • Sound knowledge of servers, infrastructure, load balancers, storage etc.
  • Solid understanding of Unix/Linux and Windows
  • Technologies: Kubernetes, Containers, serverless
  • Languages/Programming: One or more of the following: Bash or ksh, Powershell or any other common computer language
  • Prior experience writing and utilizing Terraform.

Benefits

  • Collaborative and inclusive workplace environment.
  • Opportunities for career growth and development.
  • Support for diversity and inclusion initiatives.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service