VOOZH about

URL: https://dev.to/masonwritescode/build-scrub-bar-thumbnail-previews-with-ffmpeg-and-a-webvtt-sprite-3ei2

⇱ Build scrub-bar thumbnail previews with FFmpeg and a WebVTT sprite - DEV Community


TL;DR

We're going to add hover-preview thumbnails (the little image that follows your cursor on a video scrub bar) to a player. Backend is one FFmpeg command that builds a sprite sheet, plus a tiny script that writes a WebVTT index. Frontend points the player at the VTT. Total: well under 100 lines.

What we're building

Two static artifacts and a few lines of player config:

  1. storyboard.jpg - a sprite sheet: many small frames tiled into one image.
  2. storyboard.vtt - a WebVTT file mapping each timeline range to a rectangle in the sprite via a #xywh fragment.
  3. Player wiring (Video.js shown, hls.js note at the end).

Versions used: ffmpeg 8.0.2 (anything 7.1+ works), node 20.x.

1. Generate the sprite with FFmpeg

The core command. One frame every 10 seconds, each scaled to 160px wide, tiled into a 10x10 grid:

ffmpeg -i input.mp4 \
 -vf "fps=1/10,scale=160:-1,tile=10x10" \
 -frames:v 1 storyboard.jpg

What each filter does:

  • fps=1/10 - sample one frame per 10 seconds (not every frame).
  • scale=160:-1 - 160px wide, height auto from aspect ratio.
  • tile=10x10 - pack frames into a 10-column, 10-row grid (up to 100 tiles).
# terminal output you'll see
frame= 1 fps=0.0 q=24.0 Lsize=N/A time=N/A bitrate=N/A
video:118kB audio:0kB subtitle:0kB ...

⚠️ A 10x10 grid covers 1000 seconds at a 10s interval. Longer videos overflow one sheet, and a single image can also hit the browser's max-canvas limit (~16k–32k px per side). For anything long, generate multiple sprites. We handle that in the generator below.

2. Know your numbers with ffprobe

Before writing the VTT, get the real duration so the last (partial) row is handled correctly:

ffprobe -v error -show_entries format=duration \
 -of csv=p=0 input.mp4
# 642.40

3. Generate the WebVTT index

This is the part that has to be exact. Each cue's time range must match the FFmpeg interval, or previews drift the deeper you scrub. Generate it from the same interval value, never by hand.

// scripts/makeVtt.js - node 20+
import { writeFileSync } from "node:fs";

const INTERVAL = 10; // seconds, MUST match fps=1/INTERVAL
const TILE_W = 160;
const TILE_H = 90; // know your source aspect; 16:9 -> 90
const COLS = 10;
const ROWS = 10;
const PER_SHEET = COLS * ROWS;

function ts(sec) {
 const h = String(Math.floor(sec / 3600)).padStart(2, "0");
 const m = String(Math.floor((sec % 3600) / 60)).padStart(2, "0");
 const s = String(Math.floor(sec % 60)).padStart(2, "0");
 return `${h}:${m}:${s}.000`;
}

export function makeVtt(durationSec, sheetName = "storyboard") {
 const count = Math.ceil(durationSec / INTERVAL);
 let out = "WEBVTT\n\n";

 for (let i = 0; i < count; i++) {
 const start = i * INTERVAL;
 const end = Math.min(start + INTERVAL, durationSec);

 const indexInSheet = i % PER_SHEET;
 const sheet = Math.floor(i / PER_SHEET); // 0, 1, 2...
 const col = indexInSheet % COLS;
 const row = Math.floor(indexInSheet / COLS);
 const x = col * TILE_W;
 const y = row * TILE_H;

 const img = `${sheetName}-${sheet}.jpg`; // matches multi-sheet output
 out += `${ts(start)} --> ${ts(end)}\n`;
 out += `${img}#xywh=${x},${y},${TILE_W},${TILE_H}\n\n`;
 }
 return out;
}

const duration = Number(process.argv[2] || 642.4);
writeFileSync("storyboard.vtt", makeVtt(duration));
console.log("wrote storyboard.vtt");

A few cues from the output:

WEBVTT

00:00:00.000 --> 00:00:10.000
storyboard-0.jpg#xywh=0,0,160,90

00:00:10.000 --> 00:00:20.000
storyboard-0.jpg#xywh=160,0,160,90

00:00:20.000 --> 00:00:30.000
storyboard-0.jpg#xywh=320,0,160,90

4. Generate multiple sheets for long video

To keep each sprite under the browser's image cap, split the extraction by segment. Loop over 1000-second windows and tile each into its own storyboard-N.jpg:

# scripts/make_sheets.sh
DUR=$(ffprobe -v error -show_entries format=duration -of csv=p=0 input.mp4)
WINDOW=1000 # 100 tiles * 10s
i=0
start=0
while [ "$(echo "$start < $DUR" | bc)" -eq 1 ]; do
 ffmpeg -ss "$start" -t "$WINDOW" -i input.mp4 \
 -vf "fps=1/10,scale=160:-1,tile=10x10" \
 -frames:v 1 "storyboard-${i}.jpg"
 i=$((i+1))
 start=$((start+WINDOW))
done

💡 Put -ss before -i for a fast keyframe-accurate seek so you're not decoding from the start of the file on every window.

5. Wire it into the player

Video.js with the thumbnails plugin reads the VTT directly:

<!-- index.html -->
<link href="https://vjs.zencdn.net/8.10.0/video-js.css" rel="stylesheet" />
<video id="player" class="video-js" controls preload="auto" width="800">
 <source src="https://cdn.example.com/video/master.m3u8" type="application/x-mpegURL" />
</video>
// app/player.js
import videojs from "video.js";
import "videojs-vtt-thumbnails";

const player = videojs("player");

player.vttThumbnails({
 src: "https://cdn.example.com/video/storyboard.vtt",
});

That's the whole frontend. The plugin parses the #xywh fragments and crops the sprite as you hover.

Rolling your own on a custom seek bar is just as small: parse the VTT once, then on mousemove over the bar, find the cue whose range contains the hovered time and set the preview element's background-image + background-position from the fragment.

// app/customSeekPreview.js (sketch)
function showPreview(hoverTimeSec, cues, el) {
 const cue = cues.find(c => hoverTimeSec >= c.start && hoverTimeSec < c.end);
 if (!cue) return;
 const { img, x, y, w, h } = cue; // parsed from #xywh
 el.style.width = `${w}px`;
 el.style.height = `${h}px`;
 el.style.backgroundImage = `url(${img})`;
 el.style.backgroundPosition = `-${x}px -${y}px`;
}

6. Verify alignment before you ship

The single most common bug here is drift: the previews look right at the start and wander off near the end. It's almost always an interval mismatch between FFmpeg and the VTT. Catch it in ten seconds with a spot check instead of discovering it in QA.

Extract the frame your VTT claims is at a known timestamp, and eyeball it against the sprite tile:

# what does the real video look like at 5:00 (300s)?
ffmpeg -ss 300 -i input.mp4 -frames:v 1 -q:v 2 check-300.jpg

Then find the cue covering 300s in storyboard.vtt:

00:05:00.000 --> 00:05:10.000
storyboard-0.jpg#xywh=0,360,160,90

Crop exactly that rectangle out of the sprite and compare:

ffmpeg -i storyboard-0.jpg -vf "crop=160:90:0:360" tile-check.jpg

check-300.jpg and tile-check.jpg should show the same shot. If they don't, your INTERVAL, TILE_H, or grid math is off by something. Fix it now, because a confidently wrong preview is worse than no preview at all.

💡 Wire this into CI: render the cropped tile and the real frame for three timestamps, and fail the build if they diverge past a similarity threshold. Cheap insurance against a regression in your generator.

Gotchas checklist

  • [ ] VTT interval must equal the FFmpeg fps=1/N. Mismatch = drift.
  • [ ] Watch the browser image-size cap; split into multiple sheets for long video.
  • [ ] Set TILE_H to your real aspect ratio, or the crop rectangles are wrong.
  • [ ] Cache sprite + VTT on your CDN; they're static and shared across all viewers.
  • [ ] Drop JPEG quality a notch (-q:v 5 or so) - nobody pixel-peeps a hover preview.

What's next

  • Hang sprite generation off the same worker that runs your transcode, so every upload gets previews automatically.
  • Try a WebP sprite for smaller files at the same visual quality.
  • The same sprite + VTT pattern powers chapter markers and share-card previews, so the worker pays for itself more than once.