The Friedkin Group - Tomball, TX
posted 2 months ago
As a Lead Data Engineer within the Trailblazer initiative at The Friedkin Group, you will play a crucial role in architecting, implementing, and managing robust, scalable data infrastructure. This position demands a blend of systems engineering, data integration, and data analytics skills to enhance TFG's data capabilities, supporting advanced analytics, machine learning projects, and real-time data processing needs. You will be responsible for designing and implementing scalable and reliable data pipelines to ingest, process, and store diverse data at scale, utilizing technologies such as Apache Spark, Hadoop, and Kafka. In this role, you will work within cloud environments like AWS or Azure, leveraging services including but not limited to EC2, RDS, S3, Lambda, and Azure Data Lake for efficient data handling and processing. You will develop and optimize data models and storage solutions (SQL, NoSQL, Data Lakes) to support operational and analytical applications, ensuring data quality and accessibility. Additionally, you will utilize ETL tools and frameworks (e.g., Apache Airflow, Talend) to automate data workflows, ensuring efficient data integration and timely availability of data for analytics. Collaboration is key in this position, as you will work closely with data scientists, providing the necessary data infrastructure and tools for complex analytical models, leveraging Python or R for data processing scripts. You will also ensure compliance with data governance and security policies, implementing best practices in data encryption, masking, and access controls within a cloud environment. Monitoring and troubleshooting data pipelines and databases for performance issues will be part of your responsibilities, applying tuning techniques to optimize data access and throughput. Staying abreast of emerging technologies and methodologies in data engineering will be essential, as you advocate for and implement improvements to the data ecosystem.