Conch Technologies - Columbus, OH

posted 3 months ago

Full-time - Mid Level
Columbus, OH
Administrative and Support Services

About the position

The Python Developer position at Conch Technologies Inc is a critical role focused on developing and maintaining data platforms utilizing Python, Spark, and PySpark. This position requires a strong understanding of data engineering principles and the ability to handle migration processes to PySpark on AWS. The successful candidate will be responsible for designing and implementing robust data pipelines that facilitate efficient data processing and analysis. In addition to data pipeline development, the role involves producing unit tests for Spark transformations and helper methods to ensure code quality and reliability. The candidate will also create Scala/Spark jobs for data transformation and aggregation, which are essential for processing large datasets. Writing comprehensive Scaladoc-style documentation for the code is a key responsibility, as it aids in maintaining clarity and understanding of the codebase for future developers. Performance optimization of Spark queries is another critical aspect of this role, as it directly impacts the efficiency of data processing tasks. The developer will also integrate with various SQL databases, including Microsoft SQL Server, Oracle, Postgres, and MySQL, to ensure seamless data access and manipulation. A solid understanding of distributed systems concepts, such as the CAP theorem, partitioning, replication, consistency, and consensus, is essential for success in this position. This role is based in Columbus, OH, and requires onsite presence.

Responsibilities

  • Develop and maintain data platforms using Python, Spark, and PySpark.
  • Handle migration to PySpark on AWS.
  • Design and implement data pipelines.
  • Produce unit tests for Spark transformations and helper methods.
  • Create Scala/Spark jobs for data transformation and aggregation.
  • Write Scaladoc-style documentation for code.
  • Optimize Spark queries for performance.
  • Integrate with SQL databases (e.g., Microsoft, Oracle, Postgres, MySQL).
  • Understand distributed systems concepts (CAP theorem, partitioning, replication, consistency, and consensus).

Requirements

  • Proficiency in Python, Scala (with a focus on functional programming), and Spark.
  • Familiarity with Spark APIs, including RDD, DataFrame, MLlib, GraphX, and Streaming.
  • Experience working with HDFS, S3, Cassandra, and/or DynamoDB.
  • Deep understanding of distributed systems.
  • Experience with building or maintaining cloud-native applications.

Nice-to-haves

  • Familiarity with serverless approaches using AWS Lambda is a plus.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service