Nexthink - Boston, MA

posted 3 months ago

Full-time - Senior
Remote - Boston, MA
501-1,000 employees
Professional, Scientific, and Technical Services

About the position

Nexthink is seeking a Director of Cloud Operations who will play a pivotal role in building and managing a high-performance cloud platform and Site Reliability Engineering (SRE) operations. This position is primarily focused on supporting US-based operations, with a particular emphasis on delivering services to the US Public Sector market, including a FedRAMP moderate offering. The successful candidate will be responsible for developing modern, cloud-native SRE processes and overseeing the management and operations of Nexthink's multi-tenant, microservices-based cloud platform, which has multiple instances deployed globally. In this role, you will collaborate closely with engineering teams to enhance our Continuous Integration/Continuous Deployment (CI/CD) pipeline, ensuring high-quality product releases. You will be expected to bring a solid SRE mindset to the organization, driving the adoption of industry best practices while managing operations within a security and compliance-centric delivery model. Your responsibilities will include leading all operations and SRE functions within the US organization, managing incident response, and implementing forward-thinking monitoring strategies. You will also own compliance and evidence-gathering activities for regulated deployments, such as FedRAMP Moderate, and will be tasked with capacity forecasting and change management processes. Automation will be a key focus, as you will work on delivering and operating platform services using infrastructure-as-code and monitoring-as-code methodologies. Your leadership will be crucial in building and managing service availability, performance, and scalability in production environments to meet business-defined Service Level Agreements (SLAs). Additionally, you will create alert systems to anticipate potential issues and prepare playbooks to address any anticipated problems, ensuring that our systems remain operational at all times. This role requires a strong background in cloud operations engineering leadership, particularly within SaaS companies, and a deep understanding of operating workloads in highly regulated environments. You will be responsible for recruiting, managing, and inspiring a proficient cloud engineering and SRE team, while also collaborating with various stakeholders to ensure the successful development and deployment of high-quality products.

Responsibilities

  • Lead all operations and SRE functions within the US organization, including incident response and monitoring.
  • Own and drive compliance and evidence-gathering activities for regulated deployments such as FedRAMP Moderate.
  • Drive capacity forecasting and change management processes.
  • Implement automation for delivery and operations of platform services using infrastructure-as-code and monitoring-as-code.
  • Build and manage service availability, performance, and scalability in production environments to meet business-defined SLAs.
  • Collaborate with the development organization to manage micro-services at scale on the platform.
  • Set clear SLOs to meet or exceed SLAs.
  • Ensure systems are operational and create alert systems to foresee potential issues.
  • Monitor dashboards and prepare playbooks for anticipated problems.
  • Collaborate with application and business stakeholders to ensure high-quality product development and deployment.
  • Work closely with architecture and security teams to implement enterprise-grade practices.
  • Recruit, manage, and inspire a proficient cloud engineering and SRE team.

Requirements

  • Degree in Computer Science or Engineering or equivalent professional experience.
  • 10+ years in cloud operations engineering leadership roles in SaaS companies.
  • 5+ years in a senior management/leadership role, leading large SRE and Cloud Operations teams.
  • Experience operating workloads in a secured, highly regulated environment such as FedRAMP.
  • Deep understanding and experience with major Cloud Service Providers and native cloud technologies (Docker, Kubernetes, Istio, Kafka).
  • Experience with modern CI/CD and automation tools (Jenkins, Ansible, Terraform).
  • Experience building, scaling, and monitoring infrastructure for SaaS applications and services.
  • Experience with APM and Infrastructure monitoring tools (Datadog, NewRelic, SumoLogic, Splunk, Dynatrace).
  • Managed on-call 24x7 rotation teams for global customers.
  • Experience creating a strong customer-focused SRE-driven operations culture.
  • Excellent interpersonal and communication skills.
  • Knowledge of lean and agile software engineering best practices.

Nice-to-haves

  • Multilingual capabilities
  • Experience with additional cloud service providers
  • Familiarity with security compliance frameworks beyond FedRAMP

Benefits

  • 401(k)
  • Dental insurance
  • Disability insurance
  • Flexible schedule
  • Health insurance
  • Life insurance
  • Paid holidays
  • Unlimited paid time off
  • 11 company-paid holidays
  • 3 extra days for volunteering
  • Hybrid work model
  • Free access to professional training platforms
  • Up to 16 weeks of paid leave for birthing parents/primary caregivers
  • 6 weeks for secondary caregivers
  • 401(k) plan with up to 4% company matching contributions
  • Bonuses for referring successful hires after three months of continuous employment
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service