Delaware Nation Industries - Atlanta, GA

posted 10 days ago

Full-time - Mid Level
Atlanta, GA

About the position

We are seeking a skilled Databricks Developer with 3-5 years of experience in Databricks and PySpark development to join our team. The ideal candidate will be proficient in writing PySpark and SQL code, with a strong background in working with Delta Lake Storage, Spark SQL, Azure Synapse, and DevOps practices. In this role, you will build and maintain efficient, scalable data pipelines, collaborate with cross-functional teams to deliver high-quality technical solutions, and ensure data governance, security, and performance optimization.

Responsibilities

  • Design, develop, and maintain data pipelines using Databricks with PySpark and Spark SQL to process large-scale datasets on the Azure platform.
  • Write PySpark code and SQL queries for efficient data transformation, manipulation, and analysis.
  • Experience writing to Delta Lake Storage for data persistence, ensuring that data is processed and stored in an optimized, performant manner.
  • Collaborate with cross-functional teams to understand business requirements and translate them into scalable data pipeline solutions.
  • Work with Azure Synapse for data warehousing and analytical processing in a cloud environment, optimizing performance and ensuring data accessibility.
  • Work closely with DevOps teams to implement CI/CD pipelines for automated deployment and integration of data pipelines.
  • Ensure best practices for data governance, security, and performance optimization are followed when designing and maintaining data pipelines.
  • Troubleshoot, debug, and optimize complex data pipelines to ensure smooth operation and minimal downtime.
  • Provide technical guidance to team members and participate in code reviews to maintain high coding standards and quality.
  • Collaborate with stakeholders to deliver high-quality, data-driven solutions that meet business objectives.

Requirements

  • 3-5 years of experience as a Databricks Developer or similar role with a strong focus on PySpark coding and SQL development.
  • Proficient in PySpark and SQL for data transformation, querying, and analysis.
  • Experience writing to Delta Lake Storage and understanding how to optimize the performance of data stored in Delta format.
  • Strong experience with Spark SQL for querying and managing large datasets in a distributed computing environment.
  • Experience working with Azure Synapse for cloud-based data warehousing and analytics.
  • Solid understanding of DevOps principles and experience with CI/CD pipelines for automated deployment using tools such as Azure DevOps or similar.
  • Strong problem-solving skills and the ability to work independently as well as part of a team.
  • Excellent communication skills to effectively collaborate with stakeholders at various levels of the organization.

Nice-to-haves

  • Prior exposure to healthcare or scientific domains is a plus.
  • Familiarity with Azure Data Factory for orchestrating data workflows in the cloud.
  • Exposure to other big data technologies such as Apache Hadoop or Apache Flink.
  • Experience with data governance tools and techniques, including data lineage, auditing, and security best practices in cloud environments.
  • Familiarity with containerization technologies such as Docker or Kubernetes.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service