Sky Solutions Sarl - Fort Worth, TX

posted 19 days ago

Full-time
Fort Worth, TX
Professional, Scientific, and Technical Services

About the position

The SRE Engineer role focuses on enhancing the reliability and scalability of production systems within the organization. The position involves collaborating with SRE teams, product owners, and cross-functional teams to ensure optimal performance and uptime. The engineer will facilitate Agile methodologies, track performance metrics, and advocate for best practices in Site Reliability Engineering.

Responsibilities

  • Assist SRE teams in defining and achieving goals by organizing and facilitating ceremonies such as daily stand-ups, sprint planning, sprint reviews, and retrospectives.
  • Align SRE activities with Agile methodologies, focusing on incident management, problem resolution, and reliability improvement.
  • Identify and remove impediments or blockers that may hinder the team's progress.
  • Track and analyze key performance indicators (KPIs) related to reliability, system performance, and team productivity. Report these metrics to stakeholders and leadership.
  • Drive continuous improvement initiatives across SRE processes, leveraging feedback from retrospectives and performance data.
  • Work closely with SRE engineers, product owners, and stakeholders to align the team's work with organizational goals.
  • Advocate for SRE best practices such as monitoring, alerting, automation, and system health reviews to ensure system stability and availability.
  • Coordinate and facilitate post-incident reviews, ensuring teams identify and implement action items to prevent future occurrences.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • 3+ years of experience as a developer in a technology organization.
  • 2+ years of experience working with Site Reliability Engineering (SRE), DevOps, or Infrastructure teams.
  • Familiarity with SRE practices such as incident management, SLOs/SLIs, and automation.
  • Experience with tools such as Dynatrace, Thousand Eyes, ServiceNow and similar tools.
  • Experience with Python and Java.
  • Excellent communication, leadership, and facilitation skills.
  • Understanding of cloud platforms (AWS, Google Cloud Platform, Azure), CI/CD pipelines, and observability tools (Prometheus, Grafana, ELK Stack).

Nice-to-haves

  • Previous experience working directly with SRE teams or DevOps.
  • Understanding of infrastructure-as-code tools like Terraform or CloudFormation.
  • Knowledge of containerization and orchestration technologies such as Docker and Kubernetes.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service