Data Pipeline Engineer

Lenovo - Morrisville, NC

posted about 2 months ago

Full-time - Entry Level

Morrisville, NC

10,001+ employees

Computer and Electronic Product Manufacturing

About the position

At Lenovo, we Never Stand Still. Every day, every employee at Lenovo is focused on moving forward, rejecting traditional limits, and always seeking a better way. We are looking for a Data Engineer to work with the AI platform team within our Cloud and Software (CSW) Group at Lenovo. In this role, you will work directly with locally and globally distributed teams responsible for defining, designing, and building robust data pipelines and data serving platforms in a cloud-based SaaS environment. You will be part of a highly dynamic software development team working on initiatives aimed at transforming and enhancing the overall business value of our products and platforms. As a Data Engineer, you will be responsible for using distributed data processing frameworks to ingest, extract, transform, store, serve, and build data sets that could be used by AI applications, Data Scientists, and ML engineers. You will apply your knowledge of algorithms, pipelines, cloud expertise, AI & ML, data processing, supporting tools, and technologies to develop new data pipeline solutions and manage and maintain existing pipelines. This role requires you to improve existing data models and pipelines while maintaining them for a worldwide customer base. This is a great opportunity if you are passionate about data, have a strong sense of responsibility and ownership, are resourceful in the face of ambiguity, thrive on change, are an independent thinker who can solve complex problems, and are an excellent collaborator with solid communication skills, demonstrated by successful cross-team collaboration.

Responsibilities

Develop new data pipelines for data ingestion and transformation.
Build/update capabilities of existing data pipelines including real-time streaming and batch processing.
Test data pipelines for quality, data integrity, and validity.
Take end-to-end ownership of implementing solutions to identified issues with a focus on quality, stability, security, and customer satisfaction.
Collaborate with a multidisciplinary, globally distributed team of professionals including Data Scientists, Machine Learning Engineers, Business Analysts, and Project/Product Management.
Design, build, implement, and document data models.
Work with business partners to understand business and product objectives, identify the data needed to support them, and influence decisions.
Optimize data transformation pipelines to improve latency or reduce computational time and cost.

Requirements

Bachelor's degree in computer science, Information Systems, Engineering, Math, or related technical field.
1+ year of experience developing and maintaining data processing pipelines using Spark, Hadoop, Hive.

Nice-to-haves

Master's degree is a plus.
Experience with programming languages such as Groovy and Python (preferred).
Professional experience in Data Engineering, and/or building scalable streaming and/or batch data pipelines.
Experience in Data Engineering tooling: collection, cleaning, transformation, ingestion, storage, publishing.
Advanced SQL skills (such as window functions, defining UDFs).
Experience working with relational as well as NoSQL databases and streaming platforms such as Kafka.
Knowledge of Cloud technologies and concepts is preferred, especially Athena, QuickSight & Q.
Familiarity with version control systems, CI/CD practices, testing.
Experience with data discovery, lineage, data governance, data orchestration, data quality metrics measurement is a plus.
Experience working with machine learning engineers, data scientists, and ML applications is a plus.
Familiarity with Angular and other UI frameworks is a plus.

Data Pipeline Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Tools

Career Hubs

Guides

Company