Bytedance - Seattle, WA

posted about 2 months ago

Full-time - Intern
Seattle, WA
Professional, Scientific, and Technical Services

About the position

The ByteDance Doubao (Seed) Team, established in 2023, is dedicated to building industry-leading AI foundation models with a focus on fostering both technological and social progress. The team conducts research in various areas, including natural language processing (NLP), computer vision (CV), and speech recognition and generation. With labs and researcher roles located in China, Singapore, and the US, the team leverages substantial data and computing resources to build proprietary general-purpose models with multimodal capabilities. In the Chinese market, Doubao models power over 50 ByteDance apps and business lines, including Doubao, Coze, and Dreamina, and have been launched to external enterprise clients through Volcano Engine. The Doubao app is recognized as the most used AIGC app in China. Joining ByteDance means being part of a mission to inspire creativity and enrich life. The company values every challenge as an opportunity for learning, innovation, and growth. The AML Machine Learning Systems team provides end-to-end machine learning experiences and resources, building heterogeneous ML training and inference systems based on GPU and AI chips. The team is responsible for advancing the state-of-the-art in ML systems technology, focusing on hardware acceleration technologies for AI and cloud computing. They have published papers at top-tier conferences, including SIGCOMM, NSDI, EuroSys, OSDI, SOSP, MLSys, and NeurIPS. Internships at ByteDance are designed to offer students industry exposure and hands-on experience, allowing them to turn their ambitions into reality. The program runs for 12 weeks, beginning in May/June 2025, and includes a vibrant blend of social events and development workshops to explore personal and professional growth. Candidates can apply to a maximum of two positions, and applications will be reviewed on a rolling basis, encouraging early applications.

Responsibilities

  • Research and develop machine learning systems, including heterogeneous computing architecture, management, scheduling, and monitoring.
  • Manage cross-layer optimization of system and AI algorithms and hardware for machine learning (GPU, ASIC).
  • Implement both general purpose training framework features and model specific optimizations (e.g. LLM, diffusions).
  • Improve efficiency and stability for extremely large scale distributed training jobs.

Requirements

  • Currently enrolled in a PhD program focusing on distributed and parallel computing principles, with knowledge of recent advances in computing, storage, networking, and hardware technologies.
  • Familiarity with machine learning algorithms, platforms, and frameworks such as PyTorch and Jax.
  • Basic understanding of GPU and/or ASIC operations.
  • Expertise in at least one or two programming languages in a Linux environment: C/C++, CUDA, Python.
  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment.

Nice-to-haves

  • Experience with GPU-based high performance computing and RDMA high performance networking (MPI, NCCL, ibverbs).
  • Familiarity with distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD.
  • Knowledge of AI compiler stacks such as torch.fx, XLA, and MLIR.
  • Experience in large scale data processing and parallel computing.
  • Experience in designing and operating large scale systems in cloud computing or machine learning.
  • In-depth experience with CUDA programming and performance tuning (cutlass, triton).

Benefits

  • Hourly rate of $57.
  • 100% premium coverage for full-time intern medical insurance after 90 days from the date of hire (medical coverage only, no dental or vision coverage).
  • Paid holidays and paid sick leave, with sick leave entitlement based on the time of joining.
  • Mental and emotional health benefits through the Employee Assistance Program.
  • Reimbursements for mobile phone expenses.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service