Nsight - Boston, MA

posted 11 days ago

Full-time - Senior
Boston, MA
Administrative and Support Services

About the position

The Senior Site Reliability Engineer (SRE) will lead efforts to solve complex operational challenges in mission-critical automated warehouse solutions. This role focuses on driving the stability and sustainability of next-generation systems, conducting Root Cause Analysis (RCA) meetings, and fostering a blame-free environment to gather comprehensive information about incidents and their resolutions. The SRE will work closely with Operations, IT infrastructure, Systems, and Software Engineers to identify resiliency gaps and utilize trends and metrics for continuous improvement.

Responsibilities

  • Conduct Root Cause Analysis meetings and drive the RCA process to conclusion.
  • Analyze critical incident information and create actionable RCA investigation plans.
  • Lead problem tickets and improvements to major software components and systems.
  • Engage in and improve the service lifecycle from inception to deployment and operation.
  • Perform hands-on troubleshooting of VMware, Kubernetes, and infrastructure performance incidents.
  • Act as a trusted technical advisor leading RCA investigations from start to finish.
  • Gather logs and facilitate RCA with cross-functional teams.
  • Assist internal teams with corrective actions and improvement tickets.

Requirements

  • Bachelor's degree in software engineering, Information systems, computer science, or a related field.
  • 12+ years of experience with ITSM tools such as Jira or equivalent.
  • 8+ years of infrastructure engineering experience with hands-on troubleshooting in large-scale solutions.
  • 8+ years of experience operating production systems, including troubleshooting and automation.
  • 5+ years of experience leading technical Root Cause Analysis.

Nice-to-haves

  • Experience with executive incident communication and RCA report writing.
  • Ability to communicate technical information to non-technical audiences.
  • Experience with advanced tools like Prometheus, Grafana, Logic Monitor, Elastic, VMware, and CLI usage.
  • Familiarity with PowerBi.

Benefits

  • Comprehensive health insurance coverage.
  • 401k retirement savings plan with matching contributions.
  • Flexible scheduling options.
  • Professional development opportunities.
  • Paid time off and holidays.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service