Photon - Jersey City, NJ
posted about 2 months ago
The PySpark Data Reconciliation Engineer will be responsible for designing, developing, and testing applications that automate data reconciliation processes across various financial data sources. This includes working with relational databases, NoSQL databases, batch files, and real-time data streams. The engineer will implement efficient data transformation and matching algorithms, both deterministic and heuristic, utilizing PySpark and relevant big data frameworks. A key aspect of the role will be to develop robust error handling and exception management mechanisms to ensure data integrity and system resilience within Spark jobs. In addition to development, the engineer will collaborate closely with business analysts and data architects to understand data requirements and matching criteria. This involves analyzing and interpreting data structures, formats, and relationships to implement effective data matching algorithms using PySpark. The role will also require working with distributed datasets in Spark, ensuring optimal performance for large-scale data reconciliation tasks. Another critical component of the position is the integration of PySpark applications with rules engines, such as Drools, to implement and execute complex data matching rules. The engineer will develop PySpark code to interact with the rules engine, manage rule execution, and handle rule-based decision-making processes. Furthermore, the engineer will collaborate with cross-functional teams to identify and analyze data gaps and inconsistencies between systems, designing and developing PySpark-based solutions to address data integration challenges and ensure data quality. The engineer will also contribute to the development of data governance and quality frameworks within the organization.