Tiktok - New York, NY

posted 27 days ago

Full-time - Mid Level
New York, NY
Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services

About the position

TikTok is the leading destination for short-form mobile video, with a mission to inspire creativity and bring joy. The Generative AI team under Monetization Technology is focused on developing cutting-edge Generative AI technologies across various modalities, including text, image, video, and landing pages. This team is dedicated to creating industry-leading technical solutions that enhance creative efficiency for advertisers, agencies, and creators. By leveraging Generative AI technologies, the team aims to automate creative workflows and increase overall revenue for clients and creators alike. As a Machine Learning Engineer specializing in Data Curation, you will play a crucial role in building and maintaining efficient, low-latency data pipelines in collaboration with foundational model researchers. Your responsibilities will include designing and implementing robust systems for data management, supporting the foundational training of models across various formats in distributed environments. You will also be tasked with developing caching mechanisms to improve data retrieval speeds and enhance model responsiveness, as well as implementing data insights and model evaluation pipelines to drive user engagement and revenue growth. Staying updated with the latest academic research and open-source advancements will be essential to continuously improve data operations and machine learning model performance. This position offers an exciting opportunity to work in a dynamic environment where challenges are viewed as opportunities for learning, innovation, and growth. TikTok fosters a culture of collaboration and creativity, making it an ideal place for individuals who are eager to make a significant impact in the generative AI space.

Responsibilities

  • Collaborate with foundational model researchers, including specialists in Ads LLM, Text-to-Image, and Text-to-Video, to develop and maintain efficient, low-latency data pipelines.
  • Design and implement robust, scalable systems for data curation and management, supporting the foundational training of models across various formats in distributed environments.
  • Implement data insights and model evaluation pipelines to enhance user engagement and drive revenue growth.
  • Develop caching mechanisms to improve data retrieval speeds and enhance model responsiveness.
  • Stay abreast of the latest academic research and open-source advancements, integrating cutting-edge technologies to continuously improve our data operations and machine learning model performance.

Requirements

  • B.S./M.S./Ph.D. in Computer Science, Computer Engineering, or a related field.
  • Expertise in Python and a strong foundation in deep learning frameworks, such as PyTorch, as well as large model training libraries like FSDP/DeepSpeed and asyncio.
  • A minimum of 3 years' experience with Linux, Docker, and Kubernetes.
  • Demonstrated capability in data curation, management, and optimization within Generative AI ecosystems, encompassing both streaming and batch data processing.
  • Thorough understanding of machine learning frameworks, parallel data processing techniques, and proficiency with large language models (e.g., Llama series), text to image (e.g., Diffusion-Based Models, Diffusion Transformers), and text to video technologies (e.g., EMU series, MagViT).

Nice-to-haves

  • Experience in CUDA Optimization and a deep understanding of the application of Generative AI models across multiple domains.
  • Significant experience in managing large-scale data systems, with a strong preference for those who have worked with Vector Database solutions.
  • Proficiency in cloud services (AWS/GCP) and familiarity with machine learning training, deployment, and distributed computing frameworks like Spark.
  • Exceptional communication, teamwork, and project management skills.

Benefits

  • 100% premium coverage for employee medical insurance, approximately 75% premium coverage for dependents, and a Health Savings Account (HSA) with a company match.
  • Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life, and AD&D insurance plans.
  • Flexible Spending Account (FSA) options for Health Care, Limited Purpose, and Dependent Care.
  • 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO) and 10 paid sick days per year.
  • 12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.
  • Mental and emotional health benefits through EAP and Lyra.
  • 401K company match, gym and cellphone service reimbursements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service