Python PySpark Developer

Wipro - Pittsburgh, PA

posted 3 months ago

Full-time

Pittsburgh, PA

10,001+ employees

Professional, Scientific, and Technical Services

About the position

Wipro Limited is seeking a Python PySpark Developer to join our team in Pittsburgh, Pennsylvania. The ideal candidate will have a strong background in data integration and pipeline development, with at least 4 years of relevant experience. This role requires expertise in AWS Cloud technologies, particularly in integrating data using Apache Spark, EMR, Glue, Kafka, Kinesis, and Lambda within S3, Redshift, RDS, and MongoDB/DynamoDB ecosystems. The successful candidate will have a proven track record in Python development, especially with PySpark in an AWS Cloud environment. In this position, you will be responsible for designing, developing, testing, deploying, maintaining, and improving data integration pipelines. You will leverage your strong analytical skills to write complex queries, optimize them, and debug issues as they arise. Familiarity with source control systems such as Git, Bitbucket, and Jenkins for build and continuous integration is essential. Experience with Databricks or Apache Spark is considered a plus. Your responsibilities will include innovating data integration solutions on our Apache Spark-based platform, ensuring that technology solutions utilize cutting-edge integration capabilities. You will facilitate requirements gathering and process mapping workshops, review business and functional requirement documents, and author technical design documents, testing plans, and scripts. Additionally, you will assist in implementing standard operating procedures and facilitate review sessions with functional owners and end-user representatives, using your technical knowledge to drive improvements.

Responsibilities

Design, develop, test, deploy, support, and enhance data integration solutions to connect and integrate enterprise systems in our Enterprise Data Platform.
Innovate for data integration in Apache Spark-based Platform to leverage cutting-edge integration capabilities.
Facilitate requirements gathering and process mapping workshops.
Review business and functional requirement documents, author technical design documents, testing plans, and scripts.
Assist with implementing standard operating procedures and facilitate review sessions with functional owners and end-user representatives.

Requirements

4+ years of working experience in data integration and pipeline development.
2+ years of experience with AWS Cloud on data integration with Apache Spark, EMR, Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS, MongoDB/DynamoDB ecosystems.
Strong real-life experience in Python development, especially in PySpark in AWS Cloud environment.
Experience in Python and common Python libraries.
Strong analytical experience with databases, including writing complex queries, query optimization, debugging, user-defined functions, views, and indexes.
Strong experience with source control systems such as Git, Bitbucket, and Jenkins build and continuous integration tools.

Nice-to-haves

Experience with Databricks or Apache Spark.

Python PySpark Developer

About the position

Responsibilities

Requirements

Nice-to-haves

Tools

Career Hubs

Guides

Company