EPAM Systems - Conshohocken, PA

posted 4 days ago

Full-time - Mid Level
Conshohocken, PA
Professional, Scientific, and Technical Services

About the position

The Senior Data Integration Engineer will play a crucial role in supporting a digital transformation project for one of EPAM's top clients. This position involves developing, implementing, and maintaining big data solutions on a cloud platform, with a focus on optimizing performance and managing data pipelines. The role offers opportunities for skill advancement and growth within a global organization.

Responsibilities

  • Creating and integrating tables from different data sources on cloud for data acquisition, integration, and analysis
  • Collecting router logs, storing them in AWS S3, and analyzing them with Athena to generate statistics
  • Performing transformations & actions on RDDs that can be used as data stage for ETL
  • Developing automation frameworks by connecting to multiple clusters and databases on the cloud like HBase, Mongo DB, Oracle, Teradata, and SQL Server for achieving simultaneous dataflow
  • Configuring Glue crawlers, tables, and Athena external tables from the S3 data source to run SQL queries
  • Querying different data sets with Boto3, Pandas, and Python to connect to AWS S3, HBase, and Athena for querying a large set of data and scheduling multiple jobs on EMR
  • Using Python, pandas, boto3, and spark to write a reusable code module to handle a large number of datasets
  • Developing APIs using Lambda and API Gateway to achieve an abstraction layer to integrate with on-prem systems
  • Building end-to-end test cases to validate the data flow of the APIs.

Requirements

  • Self-driven with the ability to work independently and develop solutions without close supervision
  • Hands-on experience with AWS services
  • Very good understanding of big data pipeline and common databases like relational (MySQL & PostgreSQL), MongoDB, HBase
  • Extensive experience with SQL, including modifying and writing complex queries, particularly using window functions
  • Experience performing batch/real-time processing using Spark
  • Experience designing and developing appropriate test automation frameworks and data validation techniques
  • Understanding of networking stack and any wireless protocol - preferably 802.11
  • Experience working with routers & embedded consumer products, preferably knowledge of TR-69 CPE WAN Management Protocol
  • Experience with Cloud testing tools
  • Very strong scripting experience using Spark and Python
  • Experience in the Telecommunications Industry is preferred

Nice-to-haves

  • Experience with additional ETL tools
  • Familiarity with data governance and data quality frameworks

Benefits

  • Medical, Dental and Vision Insurance (Subsidized)
  • Health Savings Account
  • Flexible Spending Accounts (Healthcare, Dependent Care, Commuter)
  • Short-Term and Long-Term Disability (Company Provided)
  • Life and AD&D Insurance (Company Provided)
  • Employee Assistance Program
  • Unlimited access to LinkedIn learning solutions
  • Matched 401(k) Retirement Savings Plan
  • Paid Time Off - the employee will be eligible to accrue 15-25 paid days
  • Paid Holidays - nine (9) total per year
  • Legal Plan and Identity Theft Protection
  • Accident Insurance
  • Employee Discounts
  • Pet Insurance
  • Employee Stock Purchase Program
  • Participation in the discretionary annual bonus program
  • Participation in the discretionary Long-Term Incentive (LTI) Program
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service