Photon - Jersey City, NJ

posted about 2 months ago

Full-time
Jersey City, NJ
Professional, Scientific, and Technical Services

About the position

The PySpark Data Reconciliation Engineer will be responsible for designing, developing, and testing applications that automate data reconciliation processes across various financial data sources. This includes working with relational databases, NoSQL databases, batch files, and real-time data streams. The engineer will implement efficient data transformation and matching algorithms, both deterministic and heuristic, utilizing PySpark and relevant big data frameworks. A key aspect of the role will be to develop robust error handling and exception management mechanisms to ensure data integrity and system resilience within Spark jobs. In addition to development, the engineer will collaborate closely with business analysts and data architects to understand data requirements and matching criteria. This involves analyzing and interpreting data structures, formats, and relationships to implement effective data matching algorithms using PySpark. The role will also require working with distributed datasets in Spark, ensuring optimal performance for large-scale data reconciliation tasks. Another critical component of the position is the integration of PySpark applications with rules engines, such as Drools, to implement and execute complex data matching rules. The engineer will develop PySpark code to interact with the rules engine, manage rule execution, and handle rule-based decision-making processes. Furthermore, the engineer will collaborate with cross-functional teams to identify and analyze data gaps and inconsistencies between systems, designing and developing PySpark-based solutions to address data integration challenges and ensure data quality. The engineer will also contribute to the development of data governance and quality frameworks within the organization.

Responsibilities

  • Design, develop, and test PySpark-based applications for automating data reconciliation processes.
  • Implement efficient data transformation and matching algorithms using PySpark and relevant big data frameworks.
  • Develop robust error handling and exception management mechanisms within Spark jobs.
  • Collaborate with business analysts and data architects to understand data requirements and matching criteria.
  • Analyze and interpret data structures, formats, and relationships for effective data matching algorithms.
  • Work with distributed datasets in Spark to ensure optimal performance for large-scale data reconciliation.
  • Integrate PySpark applications with rules engines to implement complex data matching rules.
  • Develop PySpark code to interact with the rules engine and manage rule execution.
  • Collaborate with cross-functional teams to identify and analyze data gaps and inconsistencies between systems.
  • Design and develop PySpark-based solutions to address data integration challenges and ensure data quality.
  • Contribute to the development of data governance and quality frameworks.

Requirements

  • Bachelor's degree in Computer Science or a related field.
  • 5+ years of hands-on experience in big data development, preferably with exposure to data-intensive applications.
  • Strong understanding of data reconciliation principles, techniques, and best practices.
  • Proficiency in PySpark, Apache Spark, and related big data technologies for data processing and integration.
  • Experience with rules engine integration and development.
  • Strong analytical and problem-solving skills, with the ability to translate business requirements into technical solutions.
  • Excellent communication and collaboration skills to work effectively with business analysts, data architects, and other team members.

Nice-to-haves

  • Familiarity with data streaming platforms (e.g., Kafka, Kinesis) and big data technologies (e.g., Hadoop, Hive, HBase) is a plus.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service