Winmax Systems - Seattle, WA
posted 3 months ago
We are seeking a detail-oriented Machine Learning Data Engineer to join our team. As an ML Data Engineer, you will be responsible for designing, building, and maintaining scalable data pipelines that ingest, transform, and load data from various sources into our cloud-based systems. You will work closely with machine learning teams to ensure that data is accurate, enriched, reliable, and readily available for analytics and model training. This role is crucial in supporting the data needs of our machine learning initiatives and ensuring that our data infrastructure is robust and efficient. In this position, you will create efficient, reliable, streamable, and scalable data pipelines using industry-standard tools and techniques, such as TorchData, WebDataset, Apache Parquet, Python, and SQL. You will develop strategies for ingesting data from various data providers, ensuring that the data quality and consistency are maintained throughout the process. Additionally, you will implement parallel pre-processing to clean, transform, de-duplicate, combine, and normalize data, which is essential for maintaining high-quality datasets. You will also curate, augment, and enrich existing datasets to improve data quality and provide valuable insights to stakeholders. Collaborating with synthetic data teams will be part of your responsibilities, as you will generate synthetic data and incorporate it into existing pipelines. Working closely with ML scientists, engineers, and product teams, you will understand data requirements and collaborate on data delivery to meet project goals. Monitoring the performance of data pipelines, identifying errors and bottlenecks, and implementing regular maintenance and updates will be key aspects of your role. Staying updated with the latest trends in data engineering and incorporating best practices into data pipelines will ensure that our systems remain cutting-edge. Finally, you will document data pipelines, settings, and procedures for easy maintenance and knowledge sharing within the team.