Applab Systems - Princeton, NJ

posted 7 days ago

Full-time - Mid Level
Princeton, NJ
Professional, Scientific, and Technical Services

About the position

We are seeking an Azure Data Engineer to develop and manage ETL pipelines and data processing solutions that support AI/ML initiatives. The ideal candidate will have advanced skills in Python, Apache Spark, Azure Synapse, and Azure Data Engineering services, with a strong background in Azure Cloud technologies. This role involves collaboration with cross-functional teams to translate business requirements into effective data solutions, ensuring the data platform supports both real-time and batch processing needs.

Responsibilities

  • Design, develop, and optimize scalable ETL processes using Python, Apache Spark, and Azure Synapse.
  • Build and manage Azure Data Factory pipelines to orchestrate complex data workflows.
  • Use SQL Pools and Spark Pools within Synapse to manage and process large datasets efficiently.
  • Implement Data Warehousing solutions using Azure Synapse Analytics to provide structured and queryable data layers.
  • Ensure the data platform supports real-time and batch AI/ML data requirements.
  • Build, configure, and manage CI/CD pipelines on Azure DevOps for ETL and data processing tasks.
  • Automate infrastructure provisioning, testing, and deployment using Infrastructure-as-Code (IaC) tools like ARM templates or Terraform.
  • Optimize Azure Data Lake Storage (ADLS Gen2) to store and manage raw and processed data efficiently, ensuring proper access control and data security.
  • Collaborate with Data Scientists, Data Engineers, ML Engineers, and Business Analysts to translate business requirements into data solutions.
  • Work with the DevOps and Security teams to ensure smooth and secure deployment of applications and pipelines.
  • Act as the technical lead in designing, developing, and implementing data solutions, mentoring junior team members.
  • Develop and integrate with external and internal APIs for data ingestion and data exchange.
  • Build, test, and deploy RESTful APIs for secure data access.
  • Use Kubernetes for containerizing and deploying data processing applications.
  • Manage data storage and transformation to support advanced Data Science and AI/ML models.
  • Participate in and lead Agile ceremonies, such as sprint planning, daily stand-ups, and retrospectives.
  • Collaborate with cross-functional teams in iterative development to ensure high-quality and timely feature delivery.
  • Adapt to changing project priorities and business needs in an Agile environment.

Requirements

  • Expertise in Python and Apache Spark for large-scale data processing.
  • Strong experience in Azure Synapse Analytics, including SQL Pools and Spark Pools.
  • Advanced proficiency in Azure Data Factory for ETL pipeline orchestration and management.
  • Knowledge of Data Warehousing principles, with hands-on experience building solutions on Azure.
  • Experience with SQL, including complex queries, optimization, and performance tuning.
  • Familiarity with CI/CD tools like Azure DevOps and managing infrastructure in Azure Cloud.
  • Experience in Java for API integration and microservices architecture.
  • Hands-on knowledge of Kubernetes for containerized data processing environments.
  • Proficiency in working with Azure Data Lake Storage (ADLS) Gen2 for data storage and management.
  • Experience working with APIs (REST, SOAP) and building API-based data integrations.
  • Experience working in an Agile environment, using Scrum or Kanban.
  • Ability to lead, mentor, and coach junior developers in the team.

Nice-to-haves

  • Azure certifications in data engineering or cloud architecture.
  • Experience deploying AI/ML models on cloud platforms.
  • Familiarity with Data Governance best practices, ensuring compliance with data privacy regulations.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service