Python/PySpark Developer

Capgemini - Hanover, NJ

posted 20 days ago

Full-time - Mid Level

Hanover, NJ

10,001+ employees

Professional, Scientific, and Technical Services

About the position

The Python/PySpark Developer role at Capgemini involves working within a dynamic data engineering team to build scalable distributed data processing systems. The position requires collaboration with data scientists and engineers to design and implement efficient data pipelines, focusing on processing large datasets using PySpark and other Big Data technologies.

Responsibilities

Collaborate with data scientists and engineers to design and implement efficient data pipelines.
Develop, optimize, and maintain ETL pipelines using PySpark to process large-scale datasets across distributed environments.
Design and implement complex data transformation logic using PySpark and other Big Data tools.
Work with various Big Data technologies such as Hadoop, Hive, HBase, Kafka, and Spark to build robust, scalable data systems.
Ensure data quality, consistency, and reliability by implementing data validation, monitoring, and error handling.
Fine-tune and optimize PySpark jobs to improve performance in distributed environments.
Manage and maintain data flows in HDFS ensuring scalability and fault tolerance.
Perform data extraction, aggregation, and reporting using SQL and NoSQL databases.
Participate in system design discussions and provide recommendations for architecture and performance improvements.

Requirements

Hands-on experience in Big Data technologies, particularly with PySpark.
Strong background in building scalable distributed data processing systems.
Proficiency in writing efficient, reusable, and scalable Python code for batch and real-time data processing tasks.
Experience with data validation, monitoring, and error handling to ensure data quality and reliability.
Ability to fine-tune and optimize PySpark jobs for performance in distributed environments.
Familiarity with managing data flows in HDFS and ensuring fault tolerance.

Nice-to-haves

Experience with additional Big Data tools and frameworks.
Knowledge of data warehousing concepts and practices.
Familiarity with cloud platforms and services related to data processing.

Benefits

Flexible work arrangements
Healthcare including dental, vision, mental health, and well-being programs
Financial well-being programs such as 401(k) and Employee Share Ownership Plan
Paid time off and paid holidays
Paid parental leave
Family building benefits like adoption assistance, surrogacy, and cryopreservation
Social well-being benefits like subsidized back-up child/elder care and tutoring
Mentoring, coaching, and learning programs
Employee Resource Groups
Disaster Relief

Python/PySpark Developer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company