Conch Technologies - Phoenix, AZ

posted 3 months ago

Full-time - Entry Level
Phoenix, AZ
Administrative and Support Services

About the position

The Software Developer position at Conch Technologies Inc focuses on developing and maintaining data platforms utilizing Python, Spark, and PySpark. The role involves handling the migration of existing systems to PySpark on AWS, which is crucial for enhancing the efficiency and scalability of data processing tasks. The developer will be responsible for designing and implementing robust data pipelines that facilitate the smooth flow of data across various systems. This includes working with AWS and Big Data technologies to ensure that data is processed and stored effectively. In addition to development tasks, the position requires the production of unit tests for Spark transformations and helper methods to ensure code quality and reliability. The developer will also create Scala/Spark jobs for data transformation and aggregation, which are essential for processing large datasets. Writing Scaladoc-style documentation for the code is another important aspect of the role, as it helps maintain clarity and understanding of the codebase for future developers. Performance optimization of Spark queries is a key responsibility, as it directly impacts the efficiency of data processing. The developer will also need to integrate with various SQL databases, including Microsoft SQL Server, Oracle, Postgres, and MySQL, to ensure seamless data access and manipulation. A solid understanding of distributed systems concepts, such as the CAP theorem, partitioning, replication, consistency, and consensus, is essential for success in this role.

Responsibilities

  • Develop and maintain data platforms using Python, Spark, and PySpark.
  • Handle migration to PySpark on AWS.
  • Design and implement data pipelines.
  • Work with AWS and Big Data.
  • Produce unit tests for Spark transformations and helper methods.
  • Create Scala/Spark jobs for data transformation and aggregation.
  • Write Scaladoc-style documentation for code.
  • Optimize Spark queries for performance.
  • Integrate with SQL databases (e.g., Microsoft, Oracle, Postgres, MySQL).
  • Understand distributed systems concepts (CAP theorem, partitioning, replication, consistency, and consensus).

Requirements

  • Proficiency in Python, Scala (with a focus on functional programming), and Spark.
  • Familiarity with Spark APIs, including RDD, DataFrame, MLlib, GraphX, and Streaming.
  • Experience working with HDFS, S3, Cassandra, and/or DynamoDB.
  • Deep understanding of distributed systems.
  • Experience with building or maintaining cloud-native applications.
  • Familiarity with serverless approaches using AWS Lambda is a plus.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service