Capgemini - Bridgewater, VA

posted 20 days ago

Full-time - Mid Level
Bridgewater, VA
10,001+ employees
Professional, Scientific, and Technical Services

About the position

The Python/PySpark Developer role at Capgemini involves working within a dynamic data engineering team to build scalable distributed data processing systems. The position focuses on developing, optimizing, and maintaining ETL pipelines using PySpark, while collaborating with data scientists and engineers to create efficient data pipelines and ensure data quality across large datasets.

Responsibilities

  • Collaborate with data scientists and engineers to design and implement efficient data pipelines.
  • Develop, optimize, and maintain ETL pipelines using PySpark for large-scale datasets.
  • Design and implement complex data transformation logic using PySpark and other Big Data tools.
  • Work with various Big Data technologies such as Hadoop, Hive, HBase, Kafka, and Spark.
  • Integrate data from multiple sources to create unified datasets.
  • Write efficient, reusable, and scalable Python code for batch and real-time data processing tasks.
  • Ensure data quality, consistency, and reliability through data validation and monitoring.
  • Fine-tune and optimize PySpark jobs for improved performance in distributed environments.
  • Manage and maintain data flows in HDFS for scalability and fault tolerance.
  • Perform data extraction, aggregation, and reporting using SQL and NoSQL databases.
  • Participate in system design discussions and provide recommendations for architecture and performance improvements.

Requirements

  • Hands-on experience in Big Data technologies, particularly with PySpark.
  • Strong background in building scalable distributed data processing systems.
  • Proficiency in writing efficient, reusable, and scalable Python code.
  • Experience with data validation, monitoring, and error handling.
  • Ability to fine-tune and optimize PySpark jobs in distributed environments.
  • Familiarity with managing data flows in HDFS.

Nice-to-haves

  • Experience with additional Big Data tools and technologies.
  • Knowledge of data engineering best practices.
  • Familiarity with cloud-based data solutions.

Benefits

  • Flexible work arrangements.
  • Healthcare including dental, vision, and mental health programs.
  • Financial well-being programs such as 401(k) and Employee Share Ownership Plan.
  • Paid time off and paid holidays.
  • Paid parental leave.
  • Family building benefits like adoption assistance and surrogacy.
  • Social well-being benefits like subsidized back-up child/elder care and tutoring.
  • Mentoring, coaching, and learning programs.
  • Employee Resource Groups.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service