LLM-Driven Extraction of Unstructured Data โ Built for API Deployments & ETL Pipeline Workflows
- Updated
- Python
![]() |
VOOZH | about |
LLM-Driven Extraction of Unstructured Data โ Built for API Deployments & ETL Pipeline Workflows
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
The simplest way to scale Python.
Data pipelines from re-usable components
The open-source Useful SDK. One python decorator in the Useful library allows for full observability of Python functions within an ETL.
Data Cleaning for Pyspark
A project structure for doing and sharing data engineer work.
Lien de l'application
DataSift auto applies a data pre-processing pipeline to Data Science Projects.
End-to-end Azure Data Factory project transforming raw sales data into customer-level insights using pivot transformation and storing results in Blob Storage.
This project demonstrates a comprehensive data warehousing and analytics solution, from building a data warehouse to generating actionable insights. Designed as a portfolio project, it highlights industry best practices in data engineering and analytics.
Big Data ETL pipeline for Brazilian e-commerce data. Implements data ingestion, transformation, and storage using Apache Spark, Hadoop, and SQL. Designed for scalable data processing and analytics.
Build ETL piplines on AirFlow to load data from BigQuery and store it in MySQL
Complete portfolio of data engineering projects from Udacity's Data Engineering with AWS Nanodegree.
Modern Data Warehouse and Analytics Project implementing Medallion Architecture (Bronze, Silver, Gold) with ETL pipelines, SQL data modeling, and analytical reporting.
๐๏ธ IBM Relational Database Administrator with GenAI Certificate Portfolio โ A comprehensive collection of projects, labs, and assignments showcasing expertise in relational database administration, ๐๏ธdata warehousing, ๐ETL pipelines, and ๐คGenerative AI integration for modern database management.
๐ A comprehensive showcase of projects and skills from the IBM Data Engineering Professional Certificate! ๐ Features include: ๐ ETL pipelines, ๐๏ธ data warehousing, โก big data processing with Spark/Hadoop, ๐ ๏ธ database administration, and ๐ business intelligence dashboards. Built with ๐ฆพ to demonstrate real-world data engineering capabilities!
Master the AWS Data Stack! ๐ This repository features 15+ Industrial Data Engineering Projects covering Serverless ETL, Real-Time Streaming, & Data Warehousing. Hands-on labs for S3, Lambda, Spark, Airflow, Snowflake, Redshift, Kinesis, & Glue. Includes production-grade CICD pipelines. A complete roadmap to becoming a top Data Professional.
This repository contains my first end-to-end Data Engineering project, built using Microsoft Azure Cloud and Azure Databricks with PySpark.
Add a description, image, and links to the etl-pipelines topic page so that developers can more easily learn about it.
To associate your repository with the etl-pipelines topic, visit your repo's landing page and select "manage topics."