Research Scientist in ML Systems

$137,750 - $237,500/Yr

Bytedance - Seattle, WA

posted 3 months ago

Full-time - Mid Level

Seattle, WA

Professional, Scientific, and Technical Services

About the position

As a Research Scientist in Machine Learning Systems at ByteDance, you will be at the forefront of developing and implementing cutting-edge technologies in the field of machine learning. The AML (Applied Machine Learning) Machine Learning System team is dedicated to creating high-performance, reliable, and scalable systems that enhance the capabilities of machine learning applications. Your role will involve extensive research and development of our machine learning systems, focusing on heterogeneous computing architecture, management, monitoring, and deployment. You will engage in distributed task scheduling, machine learning training, and inference, while also optimizing AI algorithms across various layers of the system. This position offers a unique opportunity to work with advanced hardware for machine learning, including GPUs, FPGAs, and ASICs, ensuring that our systems run stably and reliably. In this role, you will be responsible for integrating large-scale heterogeneous systems that utilize GPU, RDMA, and storage technologies. You will have the chance to enrich your expertise in coding, performance improvement, and problem analysis, while also being involved in the decision-making processes that shape our machine learning systems. ByteDance values creativity and innovation, and as part of our team, you will contribute to a culture that encourages learning and growth, tackling challenges with courage and a commitment to excellence.

Responsibilities

Research and develop machine learning systems focusing on heterogeneous computing architecture.
Manage, monitor, and deploy machine learning systems effectively.
Implement distributed task scheduling for machine learning training and inference.
Optimize AI algorithms across various layers of the system.
Work with hardware for machine learning, including GPU, FPGA, and ASIC technologies.

Requirements

Master's degree or higher in distributed or parallel computing principles.
Familiarity with recent advances in computing, storage, networking, and hardware technologies.
Understanding of machine learning algorithms and platforms.
Basic knowledge of GPU, FPGA, and ASIC operations.
Proficiency in at least one or two programming languages in a Linux environment, such as C/C++, Go, or Python.

Nice-to-haves

Experience with GPU-based high-performance computing and RDMA high-performance networking.
Familiarity with deep learning frameworks such as TensorFlow, Caffe, MxNet, or PyTorch.
Experience in large-scale data processing and parallel computing.
Background in designing and operating large-scale systems in cloud computing or machine learning.

Benefits

100% premium coverage for employee medical insurance, approximately 75% for dependents.
Health Savings Account (HSA) with company match.
Dental and vision insurance coverage.
Short/Long term disability insurance.
Basic life, voluntary life, and AD&D insurance plans.
Flexible Spending Account (FSA) options for healthcare and dependent care.
10 paid holidays per year and 17 days of Paid Personal Time Off (PPTO).
10 paid sick days per year.
12 weeks of paid parental leave and 8 weeks of paid supplemental disability.
Mental and emotional health benefits through EAP and Lyra.
401K company match, gym, and cellphone service reimbursements.

Research Scientist in ML Systems

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company