Paycom - Oklahoma City, OK

posted 18 days ago

Full-time - Mid Level
Oklahoma City, OK
Professional, Scientific, and Technical Services

About the position

The Site Reliability Engineer (SRE) is responsible for enhancing the reliability and performance of applications, sites, and systems in production. This role involves developing software tools, metrics, and processes to ensure the integrity and functionality of systems, while also collaborating with software development teams to implement monitoring and error detection solutions.

Responsibilities

  • Develop software to detect unusual error activity.
  • Implement workflows and processes designed to identify and reduce the overall number of application/system errors.
  • Collaborate with software development as part of the SDLC to design and implement availability, reliability, and error monitoring solutions in their applications.
  • Take responsibility for removing, isolating, or remediating errors, debugs, warnings or other kinds of messages from existing logs to improve overall log content and usefulness.
  • Limit system downtime by defining and enforcing standards for incident responses, error tracking, monitoring, and alerting with the goal to improve established reliability metrics.
  • Effectively respond to escalated site reliability issues any time of the day while on-call.
  • Conduct regular research on best practices and new technology for monitoring, alerting, error tracking and detection and application performance.

Requirements

  • Bachelor's degree in Computer Science, MIS or related field.
  • 3+ years' experience utilizing alerting and telemetry tools such as Grafana, Prometheus, Splunk, Dynatrace and others.
  • 2+ years' experience with Splunk SPL.
  • 2+ years' experience with at least one programming language such as PHP, Python, Java, .Net.

Nice-to-haves

  • 1+ years' experience with CI/CD.
  • 1+ years' experience with container and container orchestration such as Docker and Kubernetes.
  • 1+ years' experience with Prom.
  • 1+ years' experience with SQL.
  • Troubleshooting in a large-scale networked environment.
  • Knowledge of Pa's applications, systems, and database.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service