Tiktok - San Jose, CA
posted about 2 months ago
TikTok is the leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. The Data Engine team plays a crucial role in supporting various core business lines within TikTok and external enterprise customers through Volcano Engine. This team is dedicated to addressing big data architecture challenges for a massive 10EB level data set, aiming to create an industry-leading big data infrastructure. Additionally, the team provides cloud-native real-time data lake and data warehouse services to business customers through its LAS (LakeHouse Analytics Service) product. As a member of the Data Engine team, you will have the opportunity to collaborate with a highly skilled and dynamic team to build a cutting-edge big data infrastructure and architecture. You will dive deep into source code optimizations of major big data systems and represent TikTok at top-level conferences in the big data field, sharing the team's technical milestones and achievements. This role is pivotal in building a long-term competitive advantage for the data engine, ensuring that TikTok remains at the forefront of big data technology. The ideal candidate will be familiar with the principles and source code of one or more mainstream big data systems such as Spark, Presto, Flink, Hive, and HUDI. You will also need to have a strong understanding of data lake technologies, including Iceberg, HUDI, and DeltaLake, as well as the ability to diagnose failures and optimize performance in large-scale systems. Being a committer in major database projects like Spark, Flink, HUDI, Iceberg, Presto, StarRocks, Kafka, or Calcite is preferred.