VOOZH about

URL: https://huggingface.co/datasets/webxos/audioform_dataset

โ‡ฑ webxos/audioform_dataset ยท Datasets at Hugging Face


image
imagewidth (px)
1.72k
1.72k
frame_id
int64
0
9
timestamp
float64
5.37
9.5
frequency
int64
0
0
time_scale
int64
1
1
capture_date
stringdate
2026-01-13 19:57:36
2026-01-13 19:57:40
0
5.365
0
1
2026-01-13T19:57:36.427Z
1
5.813
0
1
2026-01-13T19:57:36.946Z
2
6.219
0
1
2026-01-13T19:57:37.459Z
3
6.688
0
1
2026-01-13T19:57:37.954Z
4
7.147
0
1
2026-01-13T19:57:38.619Z
5
7.552
0
1
2026-01-13T19:57:38.936Z
6
8.032
0
1
2026-01-13T19:57:39.445Z
7
8.501
0
1
2026-01-13T19:57:39.939Z
8
9.003
0
1
2026-01-13T19:57:40.519Z
9
9.504
0
1
2026-01-13T19:57:40.947Z

๐Ÿ‘ Website
๐Ÿ‘ GitHub
๐Ÿ‘ Hugging Face
๐Ÿ‘ Follow on X

 AAA UUUUUUUU UUUUUUUUDDDDDDDDDDDDD IIIIIIIIII OOOOOOOOO FFFFFFFFFFFFFFFFFFFFFF OOOOOOOOO RRRRRRRRRRRRRRRRR MMMMMMMM MMMMMMMM
 A:::A U::::::U U::::::UD::::::::::::DDD I::::::::I OO:::::::::OO F::::::::::::::::::::F OO:::::::::OO R::::::::::::::::R M:::::::M M:::::::M
 A:::::A U::::::U U::::::UD:::::::::::::::DD I::::::::I OO:::::::::::::OO F::::::::::::::::::::F OO:::::::::::::OO R::::::RRRRRR:::::R M::::::::M M::::::::M
 A:::::::A UU:::::U U:::::UUDDD:::::DDDDD:::::DII::::::IIO:::::::OOO:::::::OFF::::::FFFFFFFFF::::FO:::::::OOO:::::::ORR:::::R R:::::RM:::::::::M M:::::::::M
 A:::::::::A U:::::U U:::::U D:::::D D:::::D I::::I O::::::O O::::::O F:::::F FFFFFFO::::::O O::::::O R::::R R:::::RM::::::::::M M::::::::::M
 A:::::A:::::A U:::::D D:::::U D:::::D D:::::DI::::I O:::::O O:::::O F:::::F O:::::O O:::::O R::::R R:::::RM:::::::::::M M:::::::::::M
 A:::::A A:::::A U:::::D D:::::U D:::::D D:::::DI::::I O:::::O O:::::O F::::::FFFFFFFFFF O:::::O O:::::O R::::RRRRRR:::::R M:::::::M::::M M::::M:::::::M
 A:::::A A:::::A U:::::D D:::::U D:::::D D:::::DI::::I O:::::O O:::::O F:::::::::::::::F O:::::O O:::::O R:::::::::::::RR M::::::M M::::M M::::M M::::::M
 A:::::A A:::::A U:::::D D:::::U D:::::D D:::::DI::::I O:::::O O:::::O F:::::::::::::::F O:::::O O:::::O R::::RRRRRR:::::R M::::::M M::::M::::M M::::::M
 A:::::AAAAAAAAA:::::A U:::::D D:::::U D:::::D D:::::DI::::I O:::::O O:::::O F::::::FFFFFFFFFF O:::::O O:::::O R::::R R:::::RM::::::M M:::::::M M::::::M
 A:::::::::::::::::::::A U:::::D D:::::U D:::::D D:::::DI::::I O:::::O O:::::O F:::::F O:::::O O:::::O R::::R R:::::RM::::::M M:::::M M::::::M
 A:::::AAAAAAAAAAAAA:::::A U::::::U U::::::U D:::::D D:::::D I::::I O::::::O O::::::O F:::::F O::::::O O::::::O R::::R R:::::RM::::::M MMMMM M::::::M
 A:::::A A:::::AU:::::::UUU:::::::U DDD:::::DDDDD:::::DII::::::IIO:::::::OOO:::::::OFF:::::::FF O:::::::OOO:::::::ORR:::::R R:::::RM::::::M M::::::M
 A:::::A A:::::AUU:::::::::::::UU D:::::::::::::::DD I::::::::I OO:::::::::::::OO F::::::::FF OO:::::::::::::OO R::::::R R:::::RM::::::M M::::::M
 A:::::A A:::::A UU:::::::::UU D::::::::::::DDD I::::::::I OO:::::::::OO F::::::::FF OO:::::::::OO R::::::R R:::::RM::::::M M::::::M
AAAAAAA AAAAAAA UUUUUUUUU DDDDDDDDDDDDD IIIIIIIIII OOOOOOOOO FFFFFFFFFFF OOOOOOOOO RRRRRRRR RRRRRRRMMMMMMMM MMMMMMMM

Audioform_Dataset_v1

This dataset is the very first output from AUDIOFORM โ€” a Three.js powered 3D audio visualization tool that turns audio files into beautiful, timestamped visual frames with rich metadata. AUDIOFORM by webXOS is available for download in the /audioform/ folder of this repo so developers can create their own similar datasets. Audio for is a synthetic harmonic oscilator that runs in HTML, think of it as the "Hello World" / MNIST-style dataset application for audio-to-visual multimodal machine learning.

This dataset contains 10 captured frames from a short uploaded WAV file (played at 1ร— speed), together with per-frame metadata including dominant frequency, timestamp, and capture info.

Dataset Structure

audioform_dataset/
โ”œโ”€โ”€ images/
โ”‚ โ”œโ”€โ”€ frame_0001.png
โ”‚ โ”œโ”€โ”€ frame_0002.png
โ”‚ โ””โ”€โ”€ ... (10 PNG frames total)
โ”œโ”€โ”€ metadata.csv # Main metadata file (Hugging Face viewer uses this)
โ””โ”€โ”€ README.md
| Column | Type | Description | Example Value |
|---------------|---------|-----------------------------------------------------------------------------|-----------------------------------|
| `file_name` | string | Relative path to the visualization PNG (required by Hugging Face) | `images/frame_0001.png` |
| `frame_id` | int | Sequential frame number (0-based) | 0, 1, 2, โ€ฆ, 9 |
| `timestamp` | float | Time in seconds when the frame was captured from the audio | 5.365, 6.219, 9.504 |
| `frequency` | int | Dominant / main detected audio frequency at capture time (Hz) | 0 (in this tiny sample) |
| `time_scale` | int | Playback speed multiplier used during visualization | 1 |
| `capture_date`| string | UTC ISO timestamp when the frame was rendered | 2026-01-13T19:57:36.427Z |

See how fast a tiny diffusion model / GAN / LoRA can memorize & regenerate these exact 10 styles. Use the frames as style references for ControlNet, IP-Adapter, or fine-tuning SD to adopt this neon 3D audio-viz aesthetic.

 This dataset shows the **format** AUDIOFORM produces. 
 โ†’ Feed it real music, voices, field recordings, synths 
 โ†’ Generate 1kโ€“100k+ frames 
 โ†’ Add labels (genre, instrument, mood, multiple freq peaksโ€ฆ) 
 โ†’ Unlock serious applications:

 - Music video auto-generation 
 - Visual audio classifiers 
 - Audio-conditioned image/video generation 
 - Interactive music โ†’ 3D art installations 
 - Novel multimodal music understanding models

Dataset Description

This dataset was generated using AUDIOFORM, a 3D audio visualization system.

  • Total Frames: 10
  • Generation Date: 2026-01-13
  • Audio Type: Uploaded WAV File
  • Time Scaling: 1x

Dataset Structure

  • images/: Contains all captured frames in PNG format
  • metadata.csv: Contains classification data for each frame

Metadata Columns

  • file_name: Relative path to the image file (e.g., images/frame_0001.png) - REQUIRED for Hugging Face
  • frame_id: Unique identifier for each frame
  • timestamp: Time in seconds when frame was captured
  • frequency: Audio frequency at capture time (Hz)
  • time_scale: Playback speed multiplier
  • capture_date: ISO date string of capture

Intended Use

This dataset is intended for training machine learning models on audio visualization patterns, waveform classification, or generative AI tasks.

Downloads last month
40