Scala And Pyspark Developer

Ojusposted 5 months ago

Full-time

New York, NY

Match Score

Add your resume to Teal and unlock your Job Match score for free

Add Resume Bookmark with Teal

About the position

We are seeking an experienced Scala and PySpark Developer to join our team. The ideal candidate will have a strong background in developing scalable data solutions using Scala and PySpark, with a solid understanding of big data processing frameworks. This role involves working closely with data engineers, architects, and analysts to build data pipelines and support analytics across large datasets.

Responsibilities

Design, develop, and optimize data pipelines and ETL processes using Scala and PySpark.
Implement scalable and high-performance data processing solutions for large datasets.
Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions.
Optimize and troubleshoot data workflows to ensure efficiency and reliability.
Work with Apache Spark to design and manage distributed data processing solutions.
Manage data ingestion, data transformation, and data extraction to support downstream analytics.
Ensure data quality and consistency through data validation and error handling mechanisms.
Document and maintain code and data flow diagrams to support ongoing projects.

Requirements

5+ years of experience in data engineering with a focus on Scala and PySpark.
Proficiency with Apache Spark for data processing, including RDDs, DataFrames, and Datasets.
Strong understanding of Hadoop ecosystems and distributed computing concepts.
Experience with SQL and database systems like Hive, HBase, or Cassandra.
Familiarity with data lake architecture, cloud platforms (AWS, Google Cloud Platform, Azure), and big data storage solutions.
Proficiency in data modeling, data warehousing, and ETL design.
Solid understanding of data partitioning, shuffling, and performance optimization in Spark.
Strong analytical and problem-solving skills with attention to detail.

Nice-to-haves

Experience with Spark Streaming or real-time data processing.
Knowledge of Scala libraries for data processing (e.g., Akka, Cats).
Familiarity with CI/CD practices and version control (e.g., Git).
Experience with orchestration tools like Apache Airflow.

A Smarter and Faster Way to Build Your Resume

Go to AI Resume Builder

Scala And Pyspark Developer

About the position

Responsibilities

Requirements

Nice-to-haves

Tools

Career Hubs

Guides

Company