Data Engineer (India - Hybrid/Remote)

Insight Global - Rahway, NJ

posted about 2 months ago

Full-time - Mid Level

Rahway, NJ

Administrative and Support Services

About the position

As a Data Engineer at Insight Global, you will play a crucial role in designing, developing, and maintaining data pipelines that extract data from various sources to populate our data lake and data warehouse. Your responsibilities will include developing data transformation rules and enhancing our data modeling capabilities. You will collaborate closely with Product Analysts, Data Scientists, and Machine Learning Engineers to identify and transform data, ensuring it is understandable and actionable for our teams. In this position, you will implement data quality checks and maintain data catalogs, utilizing orchestration, logging, and monitoring tools. A strong understanding of information architecture concepts and their implementation is essential, along with proficiency in Git for version control and familiarity with various branching strategies. You will also employ test-driven development methodologies when building ELT/ETL pipelines, ensuring high-quality deliverables. Good documentation skills throughout the Software Development Life Cycle (SDLC) are required, as is the ability to communicate effectively and collaborate with other teams to develop innovative solutions. You will demonstrate a growth mindset and work alongside enterprise teams to achieve our goals. Additionally, experience with DevSecOps practices, including continuous integration (CI) and continuous delivery (CD), source code version control (e.g., Git), infrastructure-as-code (e.g., CloudFormation, Terraform), and containerization (e.g., Docker) is highly valued.

Responsibilities

Design, develop and maintain data pipelines to extract data from various sources and populate data lake and data warehouse.
Develop data transformation rules and data modeling capabilities.
Collaborate with Product Analysts, Data Scientists, and Machine Learning Engineers to transform data for better understanding.
Implement data quality checks and maintain data catalogs.
Utilize orchestration, logging, and monitoring tools for data management.
Employ test-driven development methodology for building ELT/ETL pipelines.
Document processes and maintain good SDLC documentation.
Communicate and collaborate effectively with other teams to develop solutions.
Apply DevSecOps practices including CI/CD, source code version control, and infrastructure-as-code.

Requirements

5-8+ years of Data Engineering experience based out of India.
Solid experience with Amazon Web Services (AWS) including S3, IAM, Redshift, Sagemaker, Glue, Lambda, Step Functions, and CloudWatch.
Experience with platforms like Databricks and Dataiku.
Proficient in Python programming, Java, SQL (Redshift preferred), Jenkins, CloudFormation, Terraform, Git, and Docker.
2-3 years of experience with Spark and PySpark, particularly in Cheminformatics and familiarity with pharmaceutical research.
Experience with the Atlassian stack of tools for agile software development (e.g., Jira, Confluence).

Data Engineer (India - Hybrid/Remote)

About the position

Responsibilities

Requirements

Tools

Career Hubs

Guides

Company