Vimeo - New York, NY

posted 2 months ago

Full-time - Manager
Remote - New York, NY
Food Services and Drinking Places

About the position

At Vimeo, our mission is to help businesses drive impact through video. With a vibrant community of video professionals and the vast volume of video consumption and uploads on our platform, data plays a crucial role in our success. Vimeo supports about 200 million registered video creators, billions of monthly video views, and hundreds of millions of monthly active users. We seek a talented Data Platform Engineer to enhance the reliability of our data platforms and pipelines, which process billions of events and terabytes of data daily. As a Sr. Manager Data Platform Engineer at Vimeo, you will be a critical leader in shaping and guiding our data platform strategy. You will collaborate closely with various data engineering teams to ensure our data platforms' reliability, performance, and scalability. Your work will drive our incident management processes, conduct post-mortem analyses, and implement preventive measures to avoid recurrence. If you are passionate about data reliability, scalability, automation, and leadership, this is an exciting opportunity to make a significant impact.

Responsibilities

  • Provide leadership and direction to a team of data engineers, fostering a culture of collaboration, continuous improvement, and technical excellence.
  • Mentor and develop team members, promoting career growth and ensuring the team has the skills and tools necessary to succeed.
  • Partner with engineering, product, and business teams to align data platform strategies with overall business objectives.
  • Act as the primary point of contact for data platform-related initiatives, effectively communicating with stakeholders at all levels of the organization.
  • Work with engineering teams to enhance, maintain, performance-tune, and capacity plan for Vimeo's data platforms and infrastructure.
  • Design and implement business continuity and disaster recovery plans in collaboration with engineering teams.
  • Lead the incident management process for our data platforms, including conducting post-mortems, root cause analyses, and implementing preventative measures.
  • Drive and standardize the change and release management process, promoting best practices across engineering teams to ensure legal compliance.
  • Develop and maintain intelligent monitoring systems over data pipelines and infrastructure to enable early and automated anomaly detection.
  • Collaborate with software developers to build an end-to-end automated testing framework and a system-level testing environment.
  • Participate in an on-call rotation to provide round-the-clock support for critical incidents.

Requirements

  • Proven experience in leading and managing engineering teams, with a track record of driving successful projects and initiatives.
  • Excellent communication skills, with the ability to effectively convey technical concepts to both technical and non-technical stakeholders.
  • Ability to think strategically about platform architecture and how it aligns with broader business objectives.
  • Production experience with distributed data stores (e.g., HBase, Zookeeper, Kafka) and the ability to own, manage, monitor, and optimize the reliability and health of development and production environments.
  • Strong problem-solving skills and a strong sense of ownership and drive.
  • 4+ years of experience in a Linux environment, with proficiency in cloud platforms (AWS, GCP).
  • Experience with container orchestration platforms, particularly Kubernetes, for managing and deploying data processing and analysis applications.
  • Proficiency in one or more programming languages such as Python, Java (mandatory), or Scala.
  • 4+ years of hands-on experience in Reliability Engineering for scalable, high-performance, distributed data systems with a focus on automation.
  • Experience with configuration management systems like Chef, Puppet, Ansible, or Terraform.
  • Deep understanding of CI/CD principles and familiarity with source control systems like Git.
  • Work with peer SREs to roll out changes in production environments and mitigate data-related production incidents.

Nice-to-haves

  • Experience with Change Data Capture systems like Debezium and familiarity with data warehousing and engineering.

Benefits

  • Bonus or commission
  • Restricted Stock Units (RSUs)
  • Paid time off
  • Generous 401k match
  • Wellbeing resources
  • Dynamic and collaborative work environment
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service