Voozh

👁 Image

Context Engineering for RAG : The Four Typed Inputs Behind Every RAG Answer
Large Language Model

Enterprise Document Intelligence [Vol.1 #7bis] – Tobi Lütke and Andrej Karpathy named the practice in…

Kezhan Shi

Jun 30

19 min read
👁 Image

Surviving the Data Science Behavioral Interview
Data Science

In the age of AI, standing out here means a lot more than ever. Here…

Haden Pelletier

Jun 30

7 min read

Latest

👁 Image

How to Maximize Codex Exec Command
LLM Applications

Build a more powerful coding agent setup with a model ensemble

Eivind Kjosbakken

Jun 30

8 min read
👁 Image

Stop Choosing Between Local and Cloud LLMs: A Field Guide to Hybrid Patterns
LLM Applications

A hands-on walkthrough of a hybrid local-cloud workflow using Gemma 4 and GPT-5.4, with reasoning…

Shuai Guo

Jun 30

18 min read
👁 Image

How Far Can Classical NLP Go? From Bag-of-Words to Stacking on Spooky Author Identification
Machine Learning

An end-to-end classical NLP experiment on Kaggle’s Spooky Author Identification task: from Vowpal Wabbit and…

Nahid Ahmadvand

Jun 29

17 min read
👁 Image

Prompt Engineering Fails Quietly — Prompt Regression Is Why
Large Language Models

Small prompt changes can silently break critical behavior in production. This article introduces a practical…

Emmimal P Alexander

Jun 29

17 min read
👁 Image

I Completed Five Years in Analytics Consulting: 5 Lessons That Changed How I Work
Data Science

The tools I use for analytics and reporting have changed more than I expected, yet…

Rashi Desai

Jun 29

9 min read
👁 Image

How to Choose Between Small and Frontier Models
Artificial Intelligence

The rise of small language models

Sara Nobrega

Jun 29

12 min read
👁 An image of scissors cutting off the tail of a bell curve

Tail Control: The Counterintuitive Engineering of Reliable Agentic Workflows
Agentic AI

Behind a customer’s API, a high-quality answer isn’t enough. It has to be usable, which…

Frank Wittkampf

Jun 28

27 min read
👁 Image

I Pitted XGBoost Against Logistic Regression on 358 Matches. The Boring Model Won.
Machine Learning

A concrete bias–variance lesson: why the smallest model had the best cross-validated fit, and how…

Ari Joury, PhD

Jun 28

10 min read
👁 Image

We Built a Routing Layer to Cut Our AI Costs. It Broke the Product.
Agentic AI

A team cut their AI inference bill by more than half. Three months later, customer…

Pratik Rupareliya

Jun 27

21 min read

See all of the latest

Editor’s Picks

👁 Image

Water Cooler Small Talk, Ep. 11: Overfitting in RAG evaluation
Large Language Models

Why memorizing for the exam doesn’t mean you understand the subject

Maria Mouschoutzi

Jun 26

10 min read
👁 Image

Amplify the Expert: A Philosophy for Building Enterprise RAG
Large Language Model

Enterprise Document Intelligence [Vol.1 #M1] – The thesis behind every architectural choice in this series

angela shi

Jun 26

20 min read
👁 Image

The Hot Path Belongs to GBDTs, Agents Own the Cold Path: A Payment-Fraud Benchmark
Machine Learning

A reproducible benchmark on latency, cost, and reproducibility, and where agents actually earn their keep.

Sandeep Bharadwaj Mannapur

Jun 25

17 min read
👁 Image

Your First Task as a Data Engineer in a New Company? Make the ETL Pipeline Testable
Data Engineering

A practical data engineering onboarding workflow for environment setup, automated testing, and AI-assisted development.

Jiayan Yin

Jun 24

9 min read
👁 Image

Why I Stopped Using One Agent and Built a Multi-Agent Pipeline Instead
Agentic AI

A practical walkthrough using text-to-SQL as the example

Priyansh Bhardwaj

Jun 24

13 min read
👁 Photo by National Institute of Allergy and Infectious Diseases on Unsplash

I Spent an Hour on a Data Preprocessing Task Before Asking Gemini
Data Science

How Gemini solved my Pandas problem in seconds, and why data science fundamentals still matter…

Soner Yıldırım

Jun 23

7 min read
👁 Image

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU
Agentic AI

The PCIe transfer latency is silently bottlenecking your agentic inference. Here is how building a…

Anubhab Banerjee

Jun 19

31 min read
👁 Image

Structured Outputs with LLMs: JSON Mode, Function Calling, and When to Use Each
Large Language Models

Getting reliable, readable responses out of your LLM, and knowing which tool to reach for

Maria Mouschoutzi

Jun 18

13 min read
👁 Image

Your Churn Threshold Is a Pricing Decision
Data Science

How unit economics should set your classification cutoff, and why they rarely do.

Fabio Oliveira

Jun 17

15 min read

The Variable Newsletter

👁 Image

Exciting Changes Are Coming to the TDS Author Payment Program
Writing

Authors can now benefit from updated earning tiers and a higher article cap

TDS Editors

Mar 2

2 min read
👁 Image

TDS Newsletter: Vibe Coding Is Great. Until It’s Not.
The Variable

Sorting through the good, bad, and ambiguous aspects of vibe coding

TDS Editors

Feb 5

4 min read

Deep Dives

👁 Image

Vector RAG Isn’t Enough — I Built a Context Graph Layer for Multi-Agent Memory
Large Language Model

I benchmarked raw chat history, vector-only RAG, and a context graph on the same multi-agent…

Emmimal P Alexander

Jun 25

19 min read
👁 Image

Beyond the Straight Line: Choosing Between OLS, Interaction Terms, and Tweedie Regression
Data Science

Whether you should stick to a classic Ordinary Least Squares regression, introduce interaction terms, or…

Gustavo Santos

Jun 25

14 min read
👁 Image

3 Agents. 3 LLMs. 1 Aging GPU: Engineering Parallel Inference on Bare Metal
Agentic AI

Beat the 8GB VRAM limit. Learn how to run three different LLMs on a single…

Anubhab Banerjee

Jun 25

21 min read
👁 Image

Finding the right anchors for RAG: keyword, embedding, and TOC signals in parallel
Large Language Models

Enterprise Document Intelligence [Vol.1 #7B] – Retrieval is filtering on structured tables: keywords first, TOC…

angela shi

Jun 24

33 min read
👁 Image

Retrieval Is Filtering, Not Search: A Mental Model for Enterprise RAG
Large Language Models

Enterprise Document Intelligence [Vol.1 #7A] – Stop searching strings. Filter line_df and toc_df. Pick anchors…

angela shi

Jun 23

21 min read
👁 Image

Encoding Categorical Data for Outlier Detection
Data Science

Why one-hot encoding isn’t always the best approach, and alternative encodings

W Brett Kennedy

Jun 22

21 min read

URL: https://towardsdatascience.com/

⇱ Towards Data Science

Context Engineering for RAG : The Four Typed Inputs Behind Every RAG Answer

Surviving the Data Science Behavioral Interview

Latest

How to Maximize Codex Exec Command

Stop Choosing Between Local and Cloud LLMs: A Field Guide to Hybrid Patterns

How Far Can Classical NLP Go? From Bag-of-Words to Stacking on Spooky Author Identification

Prompt Engineering Fails Quietly — Prompt Regression Is Why

I Completed Five Years in Analytics Consulting: 5 Lessons That Changed How I Work

How to Choose Between Small and Frontier Models

Tail Control: The Counterintuitive Engineering of Reliable Agentic Workflows

I Pitted XGBoost Against Logistic Regression on 358 Matches. The Boring Model Won.

We Built a Routing Layer to Cut Our AI Costs. It Broke the Product.

Editor’s Picks

Water Cooler Small Talk, Ep. 11: Overfitting in RAG evaluation

Amplify the Expert: A Philosophy for Building Enterprise RAG

The Hot Path Belongs to GBDTs, Agents Own the Cold Path: A Payment-Fraud Benchmark

Your First Task as a Data Engineer in a New Company? Make the ETL Pipeline Testable

Why I Stopped Using One Agent and Built a Multi-Agent Pipeline Instead

I Spent an Hour on a Data Preprocessing Task Before Asking Gemini

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

Structured Outputs with LLMs: JSON Mode, Function Calling, and When to Use Each

Your Churn Threshold Is a Pricing Decision

The Variable Newsletter

Exciting Changes Are Coming to the TDS Author Payment Program

TDS Newsletter: Vibe Coding Is Great. Until It’s Not.

Deep Dives

Vector RAG Isn’t Enough — I Built a Context Graph Layer for Multi-Agent Memory

Beyond the Straight Line: Choosing Between OLS, Interaction Terms, and Tweedie Regression

3 Agents. 3 LLMs. 1 Aging GPU: Engineering Parallel Inference on Bare Metal

Finding the right anchors for RAG: keyword, embedding, and TOC signals in parallel

Retrieval Is Filtering, Not Search: A Mental Model for Enterprise RAG

Encoding Categorical Data for Outlier Detection