Research Scientist in ML Systems

$145,000 - $355,000/Yr

Bytedance - San Jose, CA

posted about 2 months ago

Full-time - Mid Level
San Jose, CA
Professional, Scientific, and Technical Services

About the position

As a Research Scientist in Machine Learning Systems at ByteDance, you will play a pivotal role in advancing our machine learning capabilities. Founded in 2012, ByteDance's mission is to inspire creativity and enrich life through innovative products like TikTok, Helo, and Resso. Our team is dedicated to building a global platform for creation and communication, leveraging cutting-edge technologies in machine learning, computer vision, natural language processing, and more. You will have the opportunity to work on world-class projects that impact hundreds of millions of users worldwide. In this role, you will be responsible for researching and developing our machine learning systems, focusing on heterogeneous computing architecture, management, and monitoring. You will deploy these systems, manage distributed task scheduling, and oversee machine learning training and inference processes. Additionally, you will be tasked with optimizing the integration of AI algorithms and hardware, including GPUs, FPGAs, and ASICs, to enhance our machine learning capabilities. At ByteDance, we believe that every challenge is an opportunity for growth and innovation. We foster a collaborative environment where creativity thrives, and we are committed to building a diverse and inclusive workplace. Join us in our mission to create impactful solutions that enrich lives and inspire creativity.

Responsibilities

  • Research and develop machine learning systems, including heterogeneous computing architecture, management, and monitoring.
  • Deploy machine learning systems, including distributed task scheduling, machine learning training, and inference.
  • Manage cross-layer optimization of system and AI algorithms and hardware for machine learning (GPU, FPGA, ASIC).

Requirements

  • Master's degree or above in distributed or parallel computing principles, with knowledge of recent advances in computing, storage, networking, and hardware technologies.
  • Familiarity with machine learning algorithms and platforms.
  • Basic understanding of how GPU, FPGA, and ASIC work.
  • Expertise in at least one or two programming languages in a Linux environment: C/C++, CUDA, Python.

Nice-to-haves

  • Experience with GPU-based high-performance computing and RDMA high-performance networking (NCCL).
  • Familiarity with deep learning frameworks such as TensorFlow, Jax, or PyTorch.
  • Experience in large-scale data processing and parallel computing.
  • Experience in designing and operating large-scale systems in cloud computing or machine learning.

Benefits

  • 100% premium coverage for employee medical insurance, approximately 75% for dependents.
  • Health Savings Account (HSA) with company match.
  • Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life, and AD&D insurance plans.
  • Flexible Spending Account (FSA) options for healthcare and dependent care.
  • 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO).
  • 10 paid sick days per year.
  • 12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.
  • Mental and emotional health benefits through EAP and Lyra.
  • 401K company match.
  • Gym and cellphone service reimbursements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service