VOOZH about

URL: https://dzone.com/users/5350039/vvivek4ever.html

⇱ Vivek Venkatesan - DZone Member


Vivek Venkatesan

Lead Data Engineer at Vanguard

US

Joined Jun 2025

About

Data engineer with expertise in big data, serverless architectures, and real-time analytics. Passionate about building scalable pipelines and making data trustworthy. Contributor to healthcare and AI-driven analytics platforms.

Stats

Reputation: 748
Pageviews: 37.1K
Articles: 12
Comments: 0

Articles

Stop Loading Everything into Redshift: A Spectrum + Iceberg Pattern for Hybrid Analytics
Store large and cold datasets in Iceberg on S3, query them through Spectrum, and reserve Redshift local tables for workloads that need low latency or high concurrency.
June 12, 2026
· 2,065 Views
Stop Debugging Glue Jobs Manually: Building an Agentic Observability Layer for Data Pipelines
Glue failures scatter evidence across logs, metadata, and table state. A triage layer pulls it together and flags whether a rerun is safe.
June 2, 2026
· 2,209 Views · 1 Like
Why Embedding Pipelines Break at Scale and How Lakehouse Architecture Fixes Them
Use Apache Iceberg to store embeddings as versioned datasets and treat the vector database as a derived retrieval index.
April 20, 2026
· 2,304 Views
Serverless Glue Jobs at Scale: Where the Bottlenecks Really Are
At scale, Glue jobs become shuffle-bound, not CPU-bound. Skew and file strategy dominate runtime. Adding workers helps less than reshaping the workload.
March 13, 2026
· 5,086 Views · 1 Like
Semantic Contracts: The Missing Layer Between Good Data and Reliable AI
Semantic contracts prevent silent data and AI failures by enforcing shared data meaning and assumptions across pipelines in CI and at runtime.
February 4, 2026
· 2,591 Views · 1 Like
The Hidden Security Risks in ETL/ELT Pipelines for LLM-Enabled Organizations
As LLMs enter data pipelines, ETL/ELT becomes part of the AI security boundary, where untrusted inputs can introduce upstream risks.
January 7, 2026
· 3,375 Views · 2 Likes
Metadata, Not Data Volume, Is the Real Bottleneck in Modern Data Lakes
In Apache Iceberg data lakes, growing snapshots and manifests often make metadata resolution — not data scanning — the primary performance bottleneck.
January 6, 2026
· 3,316 Views
From Data Lakes to Intelligence Lakes: Augmenting Apache Iceberg With Generative AI Metadata on AWS
Build an AI-augmented data lake using Iceberg, Glue, and Bedrock to turn static metadata into searchable intelligence with semantic tags and AI summaries.
November 17, 2025
· 5,482 Views · 1 Like
Unlocking Scalable Data Lakes: Building With Apache Iceberg, AWS Glue, and S3
Apache Iceberg + AWS Glue + S3 bring ACID, schema evolution, and time travel to data lakes—fixing schema drift, small files, and cost sprawl at enterprise scale.
October 28, 2025
· 3,394 Views · 1 Like
Tutorial: RAG at Scale With Vector Databases vs Lakehouse Architectures
Learn how to scale RAG pipelines by storing embeddings in vector databases vs. lakehouses, with hands-on examples and key trade-offs.
September 9, 2025
· 3,289 Views
Top 5 Trends in Big Data Quality and Governance in 2025
Explore the top 5 trends in data quality and governance for 2025, from real-time validation to AI-powered checks and privacy-first practices.
July 10, 2025
· 2,124 Views · 2 Likes
How Trustworthy Is Big Data? A Guide to Real-World Challenges and Solutions
Big data only delivers value when it's reliable. Identify and fix trust issues like schema drift, outliers, and silent errors using Deequ and Great Expectations.
June 25, 2025
· 1,858 Views

User has been successfully modified

Failed to modify user

Let's be friends: