Bytedance - San Jose, CA

posted about 1 month ago

Full-time - Entry Level
San Jose, CA
Professional, Scientific, and Technical Services

About the position

As a Research Engineer Graduate specializing in Machine Learning at ByteDance, you will be at the forefront of developing large-scale machine learning systems that power our innovative products. Founded in 2012, ByteDance's mission is to inspire creativity and enrich life through technology. Our Doubao Team, established in 2023, is dedicated to crafting advanced large language models (LLMs) and leading global research in AI. This role offers you the opportunity to work on cutting-edge technologies in natural language processing, computer vision, and speech recognition, contributing to the development of applications that enhance user interaction and communication. In this position, you will be responsible for the design and development of the architecture of large-scale machine learning systems. You will tackle technical challenges related to high concurrency, reliability, and scalability, ensuring that our systems can handle the growing demand for intelligent interactions. Your work will encompass various aspects of machine learning systems, including resource scheduling, model training, model inference, data management, and workflow orchestration. You will also be tasked with researching and introducing advanced technologies, such as the latest hardware architectures and compiler-based optimization techniques, to improve our machine learning systems. Collaboration is key in this role, as you will work closely with algorithm teams to optimize both algorithms and systems. This position is ideal for PhD graduates who are eager to kickstart their careers in a dynamic environment that values creativity, innovation, and teamwork. Successful candidates will have the chance to explore limitless growth opportunities and co-create a future driven by inspiration with TikTok.

Responsibilities

  • Design and develop the architecture of large-scale machine learning systems.
  • Solve technical difficulties related to high concurrency, high reliability, and high scalability of the system.
  • Cover various sub-directions of machine learning systems, including resource scheduling, model training, model inference, data management, and workflow orchestration.
  • Research and introduce advanced technologies in machine learning systems, such as the latest hardware architecture and compiler-based optimization technologies.
  • Work closely with algorithm teams to optimize algorithms and systems jointly.

Requirements

  • PhD graduate with a background in Computer Science or a related technical field, or equivalent industrial research experience.
  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment.

Nice-to-haves

  • Prior experience in large-scale projects or influential papers in the field of large models.
  • Familiarity with at least one mainstream machine learning framework (TensorFlow/PyTorch/Jax).
  • Experience in fields such as CUDA, RDMA, AI Infrastructure, HW/SW Co-Design, High-Performance Computing, ML Hardware Architecture, ML for System, and Distributed Storage.
  • Mastery of the principles of distributed systems and experience in the design, development, and maintenance of large-scale distributed systems.
  • Demonstrated technical experience from internships, work experience, coding competitions, or publications.
  • Curiosity towards new technologies and entrepreneurship.
  • High levels of creativity and quick problem-solving capabilities.

Benefits

  • 100% premium coverage for employee medical insurance, approximately 75% premium coverage for dependents.
  • Health Savings Account (HSA) with company match.
  • Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life, and AD&D insurance plans.
  • Flexible Spending Account (FSA) options for Health Care, Limited Purpose, and Dependent Care.
  • 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO) and 10 paid sick days per year.
  • 12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.
  • Mental and emotional health benefits through EAP and Lyra.
  • 401K company match, gym and cellphone service reimbursements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service