Bytedance - San Jose, CA

posted 3 months ago

Full-time - Entry Level
San Jose, CA
Professional, Scientific, and Technical Services

About the position

As a Research Scientist in Applied Machine Learning at ByteDance, you will be at the forefront of developing and enhancing large-scale machine learning systems that power our innovative products. Your primary responsibility will be to design and develop the architecture of these systems, addressing technical challenges such as high concurrency, reliability, and scalability. You will work on various sub-directions of machine learning systems, including resource scheduling, model training, model inference, data management, and workflow orchestration. This role requires a deep understanding of advanced technologies in machine learning systems, including the latest hardware architectures and compiler-based optimization technologies. You will collaborate closely with algorithm teams to optimize both algorithms and systems, ensuring that our products meet the growing demand for intelligent interaction and improve users' lifestyles and communication methods. The Doubao Team, which you will be a part of, is dedicated to crafting the industry's most advanced large language models (LLMs). With a commitment to technological and social progress, the team conducts research in natural language processing, computer vision, and speech recognition. Leveraging substantial data and computing resources, the team has built a proprietary general-purpose model with multimodal capabilities, supporting over 50 downstream business services. As a member of this team, you will contribute to the development of cutting-edge applications in search, recommendation, advertising, content creation, conversation, and customer service, ultimately driving impact for our users and the company.

Responsibilities

  • Design and develop the architecture of large-scale machine learning systems.
  • Solve technical difficulties related to high concurrency, high reliability, and high scalability of the system.
  • Cover various sub-directions of machine learning systems, including resource scheduling, model training, model inference, data management, and workflow orchestration.
  • Research and introduce advanced technologies in machine learning systems, such as the latest hardware architecture and compiler-based optimization technologies.
  • Work closely with algorithm teams to optimize algorithms and systems jointly.

Requirements

  • PhD graduate with a background in Computer Science or a related technical field, or equivalent industrial research experience.
  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment.
  • Familiarity with at least one mainstream machine learning framework (TensorFlow/PyTorch/Jax).
  • Experience in large-scale projects or papers with significant influence in the field of large models.
  • Mastery of the principles of distributed systems and experience in the design, development, and maintenance of large-scale distributed systems.

Nice-to-haves

  • Experience in CUDA, RDMA, AI Infrastructure, HW/SW Co-Design, High-Performance Computing, or ML Hardware Architecture.
  • Curiosity towards new technologies and entrepreneurship.
  • High levels of creativity and quick problem-solving capabilities.

Benefits

  • 401(k) matching
  • AD&D insurance
  • Dental insurance
  • Disability insurance
  • Employee assistance program
  • Flexible spending account
  • Health insurance
  • 10 paid holidays per year
  • 17 days of Paid Personal Time Off (PPTO)
  • 10 paid sick days per year
  • 12 weeks of paid Parental leave
  • 8 weeks of paid Supplemental Disability
  • Mental and emotional health benefits through EAP and Lyra
  • Gym and cellphone service reimbursements
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service