Vista Higher Learning - Boston, MA

posted 22 days ago

Full-time
Remote - Boston, MA
Publishing Industries

About the position

As a Site Reliability Engineer (SRE) at VHL, you will enhance operational visibility within the DevOps team, focusing on improving the observability of applications and cloud infrastructure. Your role involves monitoring, troubleshooting outages, and ensuring the reliability of web-based services, while also documenting incidents and participating in recovery efforts.

Responsibilities

  • Improve operational visibility into services and infrastructure to preempt outages.
  • Respond quickly to outages and site reliability incidents, collaborating with the team for recovery.
  • Document site reliability incidents with detailed reporting and root cause analysis.
  • Research and implement new tools to enhance operational visibility and monitoring capabilities.
  • Manage on-call rotations to ensure 24/7 SRE coverage.
  • Engage with the security team to address vulnerabilities and incidents.

Requirements

  • High School diploma or GED.
  • Minimum 3+ years of experience as a Site Reliability Engineer (SRE) in a high traffic production environment.
  • 3+ years of experience in a *nix environment with command line proficiency.
  • 3+ years of experience with cloud-based monitoring and analytics platforms like Datadog.
  • Hands-on experience with Amazon Web Services (AWS).
  • Familiarity with Docker and container orchestration tools.
  • Experience troubleshooting complex networking issues.
  • Hands-on experience with programming languages such as Ruby or Python.
  • Knowledge of relational databases and services like AWS RDS.
  • Experience with Git and GitHub.
  • Ability to analyze and monitor data related to cloud service health.
  • Experience documenting incidents and preparing Root Cause Analysis (RCA) or After Action Reports (AAR).
  • Strong communication skills to convey technical information to non-technical stakeholders.
  • Willingness to work on-call outside of normal business hours.

Nice-to-haves

  • Project Management experience.
  • Experience with Kubernetes and/or Elastic Container Service (ECS).
  • Familiarity with the edtech industry.

Benefits

  • Remote/Hybrid work options.
  • Flexible work schedule.
  • Opportunities for professional development.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service