Research Scientist in ML Systems

$137,750 - $237,500/Yr

Bytedance - Seattle, WA

posted 3 months ago

Full-time - Mid Level
Seattle, WA
Professional, Scientific, and Technical Services

About the position

As a Research Scientist in Machine Learning Systems at ByteDance, you will be at the forefront of developing and implementing cutting-edge technologies in the field of machine learning. The AML (Applied Machine Learning) Machine Learning System team is dedicated to creating high-performance, reliable, and scalable systems that enhance the capabilities of machine learning applications. Your role will involve extensive research and development of our machine learning systems, focusing on heterogeneous computing architecture, management, monitoring, and deployment. You will engage in distributed task scheduling, machine learning training, and inference, while also optimizing AI algorithms across various layers of the system. This position offers a unique opportunity to work with advanced hardware for machine learning, including GPUs, FPGAs, and ASICs, ensuring that our systems run stably and reliably. In this role, you will be responsible for integrating large-scale heterogeneous systems that utilize GPU, RDMA, and storage technologies. You will have the chance to enrich your expertise in coding, performance improvement, and problem analysis, while also being involved in the decision-making processes that shape our machine learning systems. ByteDance values creativity and innovation, and as part of our team, you will contribute to a culture that encourages learning and growth, tackling challenges with courage and a commitment to excellence.

Responsibilities

  • Research and develop machine learning systems focusing on heterogeneous computing architecture.
  • Manage, monitor, and deploy machine learning systems effectively.
  • Implement distributed task scheduling for machine learning training and inference.
  • Optimize AI algorithms across various layers of the system.
  • Work with hardware for machine learning, including GPU, FPGA, and ASIC technologies.

Requirements

  • Master's degree or higher in distributed or parallel computing principles.
  • Familiarity with recent advances in computing, storage, networking, and hardware technologies.
  • Understanding of machine learning algorithms and platforms.
  • Basic knowledge of GPU, FPGA, and ASIC operations.
  • Proficiency in at least one or two programming languages in a Linux environment, such as C/C++, Go, or Python.

Nice-to-haves

  • Experience with GPU-based high-performance computing and RDMA high-performance networking.
  • Familiarity with deep learning frameworks such as TensorFlow, Caffe, MxNet, or PyTorch.
  • Experience in large-scale data processing and parallel computing.
  • Background in designing and operating large-scale systems in cloud computing or machine learning.

Benefits

  • 100% premium coverage for employee medical insurance, approximately 75% for dependents.
  • Health Savings Account (HSA) with company match.
  • Dental and vision insurance coverage.
  • Short/Long term disability insurance.
  • Basic life, voluntary life, and AD&D insurance plans.
  • Flexible Spending Account (FSA) options for healthcare and dependent care.
  • 10 paid holidays per year and 17 days of Paid Personal Time Off (PPTO).
  • 10 paid sick days per year.
  • 12 weeks of paid parental leave and 8 weeks of paid supplemental disability.
  • Mental and emotional health benefits through EAP and Lyra.
  • 401K company match, gym, and cellphone service reimbursements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service