Paycom - Oklahoma City, OK

posted 2 months ago

Full-time - Mid Level
Oklahoma City, OK
Professional, Scientific, and Technical Services

About the position

The Site Reliability Engineer (SRE) at Paycom is responsible for creating software tools, metrics, and processes that enhance the reliability of applications, sites, and systems in production. This full-time role focuses on ensuring the integrity and functionality of applications while mentoring junior team members to foster their development.

Responsibilities

  • Architect solutions that proactively reduce or eliminate errors and incidents in production systems.
  • Review and approve software development and processes created by junior site reliability engineers.
  • Review code and approve error logging and monitoring in new software development across all company-developed applications.
  • Take responsibility for removing, isolating, or remediating errors, debugs, warnings, or other kinds of messages from existing logs to improve overall log content and usefulness.
  • Establish, implement, and track reliability metrics (MTTR, MTTD, MTBF).
  • Effectively respond to escalated site reliability issues any time of the day while on-call.
  • Conduct regular research on best practices and new technology for monitoring, alerting, error tracking and detection, and application performance.
  • Mentor and guide junior site reliability engineers.

Requirements

  • Bachelor's degree in Computer Science, MIS, or related field.
  • 5+ years' experience utilizing alerting and telemetry tools such as Grafana, Prometheus, Splunk, Dynatrace, and others.
  • 3+ years' experience with Splunk SPL.
  • 3+ years' experience in software development with at least one programming language such as PHP, Python, Java, or .Net.
  • 2+ years' experience creating and tuning analytical tools in Splunk.

Nice-to-haves

  • 2+ years' experience with CI/CD.
  • 2+ years' experience with container and container orchestration such as Docker and Kubernetes.
  • 2+ years' experience with Prometheus PromQL.
  • 2+ years' experience with SQL.
  • Troubleshooting in a large-scale networked environment.
  • Knowledge of Paycom's applications, systems, and database.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service