INSPYR Solutions - Miami, FL

posted 8 days ago

Full-time - Mid Level
Miami, FL
Administrative and Support Services

About the position

The Site Reliability Engineer will play a critical role in ensuring the reliability, performance, and seamless operation of Royal Caribbean Cruise Lines' digital ecosystem, which includes guest-facing mobile apps, websites, and backend systems. The engineer will collaborate with development, operations, and product teams to build and maintain a highly resilient and scalable digital experience for guests.

Responsibilities

  • Respond to and resolve production incidents, prioritizing guest-facing issues to minimize disruption.
  • Conduct root cause analysis and implement preventive measures to avoid recurrence.
  • Build, maintain, and enhance monitoring tools and dashboards to provide visibility into system health and performance.
  • Develop and implement automation scripts and tools to streamline operations and improve system reliability.
  • Work closely with product teams to incorporate reliability principles into new feature development.
  • Create and maintain clear documentation on system architecture and incident postmortems.
  • Participate in on-call rotation to acknowledge and escalate incidents.

Requirements

  • Strong knowledge of mobile (iOS, Android) and web technologies, backend systems, cloud infrastructure (AWS, Azure), and database technologies.
  • Proficiency in one or more programming languages (e.g., Python, Java, Go) for scripting and automation.
  • Experience with monitoring tools like Prometheus, Grafana, or Splunk.
  • Experience with incident management tools like PagerDuty or ServiceNow.
  • Understanding of security best practices and incident response.
  • Excellent written and verbal communication skills.
  • Ability to work with large volumes of customer data and use Oracle SQL (or similar) to query databases.

Nice-to-haves

  • 5+ years of demonstrated proficiency in one or more scripting languages such as Python or Go.
  • 3+ years of experience with Kubernetes or equivalent.
  • 5+ years of software development experience in Java or JavaScript.
  • 3+ years of experience with containers and container orchestrators like Docker and Kubernetes.
  • 5+ years of experience debugging and fixing system/infrastructure and application issues.
  • 5+ years of experience working with monitoring tools such as Prometheus, Grafana, or Google Stack Driver.
  • 5+ years of experience with databases (SQL or NoSQL).
  • 5+ years of experience with log analysis and building dashboards.
  • At least 6 years in a Reliability Engineering, DevOps, or infrastructure-focused role.

Benefits

  • Comprehensive medical benefits
  • Competitive pay
  • 401(k) retirement plan
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service