Coreweave - New York, NY

posted 7 days ago

Full-time - Entry Level
Remote - New York, NY
Professional, Scientific, and Technical Services

About the position

The Cloud Operations Engineer at CoreWeave plays a crucial role in maintaining the performance and availability of the cloud platform, which supports AI-driven solutions. This position involves working in a remote operations center, responding to incidents, and collaborating with various teams to ensure operational readiness. The role is designed for individuals who thrive in dynamic environments and are eager to tackle complex challenges in cloud operations.

Responsibilities

  • Proactively identify performance and availability issues in production with a customer-first mindset.
  • Interpret operational and observability data to assess system performance and adherence to Service Level Objectives.
  • Investigate, validate, and triage alerts and incidents.
  • Develop and maintain dashboarding and alerting to provide insight into customer experience.
  • Provide Tier 2 support for internal and customer-facing services.
  • Act autonomously to initiate and coordinate responses to priority incidents as Incident Commander.
  • Participate in and/or conduct incident post-mortems and draft Post Incident Review documents.
  • Identify opportunities and implement solutions to improve response processes.
  • Partner with SRE and Service Owning teams to ensure operational readiness for services and applications.
  • Create and maintain knowledge articles and documentation.

Requirements

  • Broad technology and troubleshooting skills with a desire to expand knowledge in networking, storage, Kubernetes, automation, and observability.
  • Experience in a support capacity with a broad understanding of modern applications and infrastructure.
  • Ability to manage communication and coordinate multiple engineers during an incident.
  • Desire to learn or experience with automation.
  • Comfortable working on the Linux CLI with a foundational understanding of scripting including conditionals, variables, and loop structures.
  • Experience in open source environments.

Nice-to-haves

  • Experience with observability data to visualize service health and diagnose performance issues.
  • Excitement to help bootstrap a new team and contribute to developing scalable processes.
  • Openness to feedback, coaching, and active participation in team improvement.

Benefits

  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Tuition Reimbursement
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service