Ojus - New York, NY

posted about 2 months ago

Full-time
New York, NY

About the position

We are seeking an experienced Scala and PySpark Developer to join our team. The ideal candidate will have a strong background in developing scalable data solutions using Scala and PySpark, with a solid understanding of big data processing frameworks. This role involves working closely with data engineers, architects, and analysts to build data pipelines and support analytics across large datasets.

Responsibilities

  • Design, develop, and optimize data pipelines and ETL processes using Scala and PySpark.
  • Implement scalable and high-performance data processing solutions for large datasets.
  • Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions.
  • Optimize and troubleshoot data workflows to ensure efficiency and reliability.
  • Work with Apache Spark to design and manage distributed data processing solutions.
  • Manage data ingestion, data transformation, and data extraction to support downstream analytics.
  • Ensure data quality and consistency through data validation and error handling mechanisms.
  • Document and maintain code and data flow diagrams to support ongoing projects.

Requirements

  • 5+ years of experience in data engineering with a focus on Scala and PySpark.
  • Proficiency with Apache Spark for data processing, including RDDs, DataFrames, and Datasets.
  • Strong understanding of Hadoop ecosystems and distributed computing concepts.
  • Experience with SQL and database systems like Hive, HBase, or Cassandra.
  • Familiarity with data lake architecture, cloud platforms (AWS, Google Cloud Platform, Azure), and big data storage solutions.
  • Proficiency in data modeling, data warehousing, and ETL design.
  • Solid understanding of data partitioning, shuffling, and performance optimization in Spark.
  • Strong analytical and problem-solving skills with attention to detail.

Nice-to-haves

  • Experience with Spark Streaming or real-time data processing.
  • Knowledge of Scala libraries for data processing (e.g., Akka, Cats).
  • Familiarity with CI/CD practices and version control (e.g., Git).
  • Experience with orchestration tools like Apache Airflow.
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service