Bytedance - San Jose, CA

posted about 1 month ago

Full-time - Mid Level
San Jose, CA
Professional, Scientific, and Technical Services

About the position

ByteDance is seeking a Machine Learning Engineer to join our AML team, which is dedicated to advancing the next-generation AI infrastructure and recommendation platform for ads ranking, search ranking, and e-commerce ranking. This role is pivotal in supporting our mission to drive substantial impact on the core businesses of the company. As a Machine Learning Engineer, you will be responsible for designing and implementing a global-scale machine learning system that enhances the usability and flexibility of our machine learning infrastructure. You will also work on improving the workflow of model training and serving, data pipelines, storage systems, and resource management for multi-tenancy machine learning systems. Additionally, you will design and develop key components of the ML infrastructure and mentor interns, fostering a collaborative and innovative environment. At ByteDance, we believe that every challenge is an opportunity for learning, innovation, and growth. We are committed to creating a workplace that inspires creativity and enriches life, and we encourage our team members to embrace challenges with courage and a willingness to learn. By joining us, you will be part of a team that values collaboration and strives to make a meaningful impact on our users and the company as a whole. We are looking for individuals who are not only technically proficient but also possess a strong sense of responsibility, excellent communication skills, and a passion for continuous learning.

Responsibilities

  • Design and implement a global-scale machine learning system for feeds, ads, and search ranking models.
  • Improve the usability and flexibility of the machine learning infrastructure.
  • Enhance the workflow of model training and serving, data pipelines, storage systems, and resource management for multi-tenancy machine learning systems.
  • Design and develop key components of ML infrastructure and mentor interns.

Requirements

  • Proficient in at least one programming language such as Go or Python in a Linux environment, with excellent coding skills.
  • Familiar with open source distributed scheduling/orchestration/storage frameworks, such as Kubernetes, Yarn, Mesos, Celery, HDFS, Redis, S3, etc.
  • Master the principles of distributed systems and participate in the design, development, and maintenance of large-scale distributed systems.
  • Possess excellent logical analysis ability, able to perform reasonable abstraction and decomposition of business logic.
  • Have a strong sense of responsibility, good learning ability, communication ability, and self-motivation, and be able to respond and act quickly.
  • Good working document habits, with the ability to write and update workflow and technical documents in a timely manner.

Nice-to-haves

  • Experience contributing to an open-sourced machine learning framework (TensorFlow/PyTorch).
  • Experience in big data frameworks (e.g., Spark/Hadoop/Flink), with experience in resource management and task scheduling for large scale distributed systems.
  • Experience in using/designing open-source machine learning lifecycle management systems: TFX.

Benefits

  • 100% premium coverage for employee medical insurance, approximately 75% premium coverage for dependents.
  • Health Savings Account (HSA) with a company match.
  • Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life, and AD&D insurance plans.
  • Flexible Spending Account (FSA) options for Health Care, Limited Purpose, and Dependent Care.
  • 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO) and 10 paid sick days per year.
  • 12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.
  • Mental and emotional health benefits through EAP and Lyra.
  • 401K company match, gym and cellphone service reimbursements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service