Capgemini - Hanover, NJ

posted 20 days ago

Full-time - Mid Level
Hanover, NJ
10,001+ employees
Professional, Scientific, and Technical Services

About the position

The Python/PySpark Developer role at Capgemini involves working within a dynamic data engineering team to build scalable distributed data processing systems. The position requires collaboration with data scientists and engineers to design and implement efficient data pipelines, focusing on processing large datasets using PySpark and other Big Data technologies.

Responsibilities

  • Collaborate with data scientists and engineers to design and implement efficient data pipelines.
  • Develop, optimize, and maintain ETL pipelines using PySpark to process large-scale datasets across distributed environments.
  • Design and implement complex data transformation logic using PySpark and other Big Data tools.
  • Work with various Big Data technologies such as Hadoop, Hive, HBase, Kafka, and Spark to build robust, scalable data systems.
  • Ensure data quality, consistency, and reliability by implementing data validation, monitoring, and error handling.
  • Fine-tune and optimize PySpark jobs to improve performance in distributed environments.
  • Manage and maintain data flows in HDFS ensuring scalability and fault tolerance.
  • Perform data extraction, aggregation, and reporting using SQL and NoSQL databases.
  • Participate in system design discussions and provide recommendations for architecture and performance improvements.

Requirements

  • Hands-on experience in Big Data technologies, particularly with PySpark.
  • Strong background in building scalable distributed data processing systems.
  • Proficiency in writing efficient, reusable, and scalable Python code for batch and real-time data processing tasks.
  • Experience with data validation, monitoring, and error handling to ensure data quality and reliability.
  • Ability to fine-tune and optimize PySpark jobs for performance in distributed environments.
  • Familiarity with managing data flows in HDFS and ensuring fault tolerance.

Nice-to-haves

  • Experience with additional Big Data tools and frameworks.
  • Knowledge of data warehousing concepts and practices.
  • Familiarity with cloud platforms and services related to data processing.

Benefits

  • Flexible work arrangements
  • Healthcare including dental, vision, mental health, and well-being programs
  • Financial well-being programs such as 401(k) and Employee Share Ownership Plan
  • Paid time off and paid holidays
  • Paid parental leave
  • Family building benefits like adoption assistance, surrogacy, and cryopreservation
  • Social well-being benefits like subsidized back-up child/elder care and tutoring
  • Mentoring, coaching, and learning programs
  • Employee Resource Groups
  • Disaster Relief
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service