Dataset Viewer

Coffee First Crack Audio Dataset

Audio dataset for training coffee first crack detection models. Contains 10-second WAV chunks from coffee roasting recordings, labelled as first_crack (popping/cracking sounds) or no_first_crack (background roast noise).

Model: syamaner/coffee-first-crack-detection Source code: github.com/syamaner/coffee-first-crack-detection

How this was built:

Part 1 — The Architecture & The Agent — Spec-Driven ML Development with Warp/Oz

Part 2 — Building an Open Coffee Roasting Audio Dataset with Warp/Oz

Part 3 — The Science — Hyperparameter Tuning to 100% Precision with Warp/Oz

Part 4 — The Edge — Deployment of an 86M Parameter Audio Transformer

Part 5 — From Local Model to Live Demo — Publishing to Hugging Face with Warp/Oz

Original prototype:

Part 1 — Training a Neural Network to Detect Coffee First Crack from Audio

Dataset Summary

973 total chunks (fixed 10-second sliding windows, no overlap)
15 source recordings from 2 microphones, 3 coffee origins
Recording-level split (no data leakage between splits)
20% first_crack / 80% no_first_crack — realistic class imbalance

Split	first_crack	no_first_crack	Total	Recordings
Train	124	463	587	9
Val	37	158	195	3
Test	36	155	191	3

Annotation Approach

Each source recording was annotated in Label Studio with a single first_crack region spanning from the first audible pop to the end of consistent cracking. The chunk_audio.py script then slid fixed 10-second windows across each recording and labelled each window based on overlap (>=50% threshold) with annotated first_crack regions.

This approach replaces the prototype method of manually annotating 20-30 small regions per file, producing consistent real-audio training chunks that match what the model sees during inference.

Features

Feature	Type	Description
`audio`	Audio (16kHz)	10-second mono WAV chunk
`label`	string	`first_crack` or `no_first_crack`
`label_id`	int	1 = first_crack, 0 = no_first_crack
`microphone`	string	`mic-1-original` or `mic-2-new`
`coffee_origin`	string	e.g. `brazil`, `costarica-hermosa`, `brazil-santos`

Source Recordings

Mic	Origin	Recordings	Notes
mic-1-original	costarica-hermosa	5	Legacy recordings from prototype
mic-1-original	brazil	4	Legacy recordings from prototype
mic-2-new	brazil	4	New recordings (Feb 2026)
mic-2-new	brazil-santos	2	New recordings (Apr 2026)

Usage

from datasets import load_dataset

ds = load_dataset("syamaner/coffee-first-crack-audio")
print(ds)
# DatasetDict({
# train: Dataset({features: [audio, label, ...], num_rows: 587})
# val: Dataset({features: [audio, label, ...], num_rows: 195})
# test: Dataset({features: [audio, label, ...], num_rows: 191})
# })

# Access a sample
sample = ds["train"][0]
print(sample["label"], sample["microphone"], sample["coffee_origin"])

Citation

@misc{yamaner2026coffeefc,
 author = {Yamaner, Sertan},
 title = {Coffee First Crack Audio Dataset},
 year = {2026},
 url = {https://huggingface.co/datasets/syamaner/coffee-first-crack-audio}
}

Downloads last month: 69

Models trained or fine-tuned on syamaner/coffee-first-crack-audio

Audio Classification • 86.2M • Updated May 9 • 6

URL: https://huggingface.co/datasets/syamaner/coffee-first-crack-audio

⇱ syamaner/coffee-first-crack-audio · Datasets at Hugging Face