VOOZH about

URL: https://dev.to/yushulx/how-to-build-a-javascript-multi-page-document-scanner-web-app-with-auto-capture-and-pdf-export-16ga

⇱ How to Build a JavaScript Multi-Page Document Scanner Web App with Auto-Capture and PDF Export - DEV Community


Scanning physical documents with a phone or laptop camera—and getting a clean, de-skewed result instantly—is a common requirement in expense-management, healthcare, and contract-signing workflows. The Dynamsoft Capture Vision SDK handles the heavy lifting (boundary detection, perspective normalization) entirely in the browser via WebAssembly, so no server-side component is needed.

What you'll build: A vanilla-JavaScript web app that opens the device camera, automatically detects document boundaries frame-by-frame, captures and normalizes multi-page documents, lets users edit corners, apply image filters, reorder pages, and export the result as a PDF or individual PNGs — powered by Dynamsoft Capture Vision Bundle v3.2.5000.

Online Demo

https://yushulx.me/javascript-barcode-qr-code-scanner/examples/multi-document-capture/

Demo Video: Multi-Page Document Scanner in Action

Prerequisites

  • A modern browser with camera access (Chrome, Edge, Safari, Firefox).
  • Node.js and any static HTTP server (Python's http.server, npx serve, etc.) for local development.
  • A Dynamsoft Capture Vision license key.

Get a 30-day free trial license for Dynamsoft Capture Vision.

Step 1: Load the SDK via CDN

The entire SDK is delivered as a single bundle from jsDelivr — no npm install required. Add both the Dynamsoft Capture Vision bundle and jsPDF to the <head> of index.html:

<script src="https://cdn.jsdelivr.net/npm/dynamsoft-capture-vision-bundle@3.2.5000/dist/dcv.bundle.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/jspdf@2.5.1/dist/jspdf.umd.min.js"></script>

The bundle includes the Document Detection & Normalization (DDN) module, the CaptureVisionRouter, and the LicenseManager — everything needed to go from raw camera frames to normalized document images.

Step 2: Initialize the License and CaptureVisionRouter

After the user submits a license key on the start screen, initSDK activates the license, loads the DDN WebAssembly module, creates a CaptureVisionRouter instance, and loads the preset template file that defines the detection and normalization tasks:

const TEMPLATE_PATH = "./DBR_and_DDN_detect_PresetTemplates.json";
const DETECT_TEMPLATE = "DetectDocumentBoundaries_Default";
const NORMALIZE_TEMPLATE = "NormalizeDocument_Default";

async function initSDK(licenseKey) {
 updateInitStatus("Activating license...");
 await Dynamsoft.License.LicenseManager.initLicense(licenseKey, true);

 updateInitStatus("Loading modules...");
 await Dynamsoft.Core.CoreModule.loadWasm(["DDN"]);

 updateInitStatus("Initializing scanner...");
 cvr = await Dynamsoft.CVR.CaptureVisionRouter.createInstance();
 await cvr.initSettings(TEMPLATE_PATH);

 updateInitStatus("Opening camera...");
 await initCamera();
}

The initSettings call registers DetectDocumentBoundaries_Default and NormalizeDocument_Default as named tasks so they can be invoked by name later.

Step 3: Run the Frame-by-Frame Detection Loop

👁 real-time document detection with camera

The scan loop calls cvr.capture() on each animation frame, passing the live <video> element and the detection template name. When a four-point quad is returned, it is drawn on an overlay <canvas> and fed into the stabilizer:

async function scanLoop() {
 if (!isScanning) return;

 if (isProcessingFrame || isCaptureInProgress) {
 scanLoopId = requestAnimationFrame(scanLoop);
 return;
 }

 isProcessingFrame = true;
 try {
 const result = await cvr.capture(videoElement, DETECT_TEMPLATE);
 let quad = null;

 for (const item of result.items || []) {
 if (item.location && item.location.points && item.location.points.length === 4) {
 quad = item.location.points;
 break;
 }
 }

 if (quad) {
 latestDetectedQuad = quad;
 drawOverlay(quad);
 // ... stabilizer logic and auto-capture
 } else {
 clearOverlay();
 resetStabilizer();
 }
 } catch (_) {
 clearOverlay();
 }

 isProcessingFrame = false;
 if (isScanning) {
 scanLoopId = requestAnimationFrame(scanLoop);
 }
}

The isProcessingFrame guard prevents concurrent captures on devices where WebAssembly inference takes longer than a single animation frame.

Step 4: Stabilize the Detected Quad Before Auto-Capture

Firing a capture the instant a quad is detected produces blurry or mis-framed results. The stabilizer compares consecutive quads using Intersection over Union (IoU) and an area-delta check; only after stableFrameCount consecutive stable frames is the capture triggered automatically:

const quadStabilizer = {
 enabled: true,
 iouThreshold: 0.85,
 areaDeltaThreshold: 0.15,
 stableFrameCount: 3,
};

function isQuadStable(current, previous) {
 if (!current || !previous || current.length !== 4 || previous.length !== 4) return false;
 const boxA = pointsToBoundingBox(current);
 const boxB = pointsToBoundingBox(previous);
 const iou = computeIoU(boxA, boxB);

 const areaA = polygonArea(current);
 const areaB = polygonArea(previous);
 const areaDelta = areaB === 0 ? 1 : Math.abs(areaA - areaB) / areaB;

 return iou >= quadStabilizer.iouThreshold && areaDelta <= quadStabilizer.areaDeltaThreshold;
}

All three thresholds (iouThreshold, areaDeltaThreshold, stableFrameCount) are exposed in an in-app settings panel so users can tune them for their environment.

Step 5: Normalize the Document with Perspective Correction

Once a stable quad is confirmed (or the user taps the capture button), normalizeDocument projects the detected boundary points back into cvr to perform perspective de-skew and returns the corrected image as a <canvas>:

async function normalizeDocument(frameCanvas, points) {
 try {
 const settings = await cvr.getSimplifiedSettings(NORMALIZE_TEMPLATE);
 settings.roi.points = points;
 settings.roiMeasuredInPercentage = 0;
 await cvr.updateSettings(NORMALIZE_TEMPLATE, settings);

 const normalizedResult = await cvr.capture(frameCanvas, NORMALIZE_TEMPLATE);
 for (const item of normalizedResult.items || []) {
 if (item.toCanvas && typeof item.toCanvas === "function") {
 return item.toCanvas();
 }
 }
 return null;
 } catch (_) {
 return null;
 }
}

Setting roiMeasuredInPercentage = 0 tells the router that settings.roi.points are pixel coordinates in the source frame, not percentages.

Step 6: Apply Image Filters Per Page

After capture, each page can be rendered in color, grayscale, or binary mode. The filter is applied client-side using getImageData / putImageData on a scratch canvas, leaving the original baseCanvas untouched:

function buildFilteredCanvas(page) {
 const src = page.baseCanvas;
 const out = document.createElement("canvas");
 out.width = src.width;
 out.height = src.height;
 const ctx = out.getContext("2d", { willReadFrequently: true });
 ctx.drawImage(src, 0, 0);

 if (page.filter === "color") {
 return out;
 }

 const image = ctx.getImageData(0, 0, out.width, out.height);
 const data = image.data;
 for (let i = 0; i < data.length; i += 4) {
 const gray = Math.round(data[i] * 0.299 + data[i + 1] * 0.587 + data[i + 2] * 0.114);
 if (page.filter === "grayscale") {
 data[i] = gray;
 data[i + 1] = gray;
 data[i + 2] = gray;
 } else {
 const binary = gray > 140 ? 255 : 0;
 data[i] = binary;
 data[i + 1] = binary;
 data[i + 2] = binary;
 }
 }
 ctx.putImageData(image, 0, 0);
 return out;
}

Step 7: Edit the Document Quad After Capture

👁 edit document quad

If the auto-detected corners are inaccurate, the user can open the edit screen for any captured page, drag the four corner handles to the correct positions, and re-apply perspective normalization without recapturing. openEdit loads the stored originalCanvas and quadPoints into editState, then renderEditCanvas draws the image and overlays draggable 14 px circular handles:

function openEdit() {
 const page = pages[currentPageIndex];
 if (!page.originalCanvas || !page.quadPoints) {
 showToast("No quad data available for editing.");
 return;
 }
 editState = {
 originalCanvas: page.originalCanvas,
 quadPoints: page.quadPoints.map(p => ({ x: p.x, y: p.y })),
 draggingCorner: -1,
 imgRect: null,
 };
 resultScreen.classList.add("hidden");
 editScreen.classList.remove("hidden");
 requestAnimationFrame(renderEditCanvas);
}

function renderEditCanvas() {
 if (!editState) return;
 const rect = editCanvasEl.getBoundingClientRect();
 const cw = Math.round(rect.width);
 const ch = Math.round(rect.height);
 if (cw === 0 || ch === 0) return;
 editCanvasEl.width = cw;
 editCanvasEl.height = ch;

 const img = editState.originalCanvas;
 const scale = Math.min(cw / img.width, ch / img.height);
 const imgW = Math.round(img.width * scale);
 const imgH = Math.round(img.height * scale);
 const imgX = Math.round((cw - imgW) / 2);
 const imgY = Math.round((ch - imgH) / 2);
 editState.imgRect = { x: imgX, y: imgY, w: imgW, h: imgH };

 editCtx2.clearRect(0, 0, cw, ch);
 editCtx2.drawImage(img, imgX, imgY, imgW, imgH);

 const pts = editState.quadPoints.map(p => ({
 x: imgX + (p.x / img.width) * imgW,
 y: imgY + (p.y / img.height) * imgH,
 }));

 editCtx2.beginPath();
 editCtx2.moveTo(pts[0].x, pts[0].y);
 for (let i = 1; i < 4; i++) editCtx2.lineTo(pts[i].x, pts[i].y);
 editCtx2.closePath();
 editCtx2.fillStyle = "rgba(106,196,187,0.2)";
 editCtx2.fill();
 editCtx2.strokeStyle = "#6ac4bb";
 editCtx2.lineWidth = 2;
 editCtx2.stroke();

 const HANDLE_R = 14;
 pts.forEach((p, i) => {
 editCtx2.beginPath();
 editCtx2.arc(p.x, p.y, HANDLE_R, 0, Math.PI * 2);
 editCtx2.fillStyle = i === editState.draggingCorner ? "#fe8e14" : "#6ac4bb";
 editCtx2.fill();
 editCtx2.strokeStyle = "#fff";
 editCtx2.lineWidth = 2;
 editCtx2.stroke();
 });
}

When the user taps Apply, applyEdit calls normalizeDocument with the updated corner coordinates and replaces the stored baseCanvas in-place — the page is corrected without affecting any other page:

async function applyEdit() {
 if (!editState) return;
 editApplyBtn.disabled = true;
 try {
 const normalized = await normalizeDocument(editState.originalCanvas, editState.quadPoints);
 if (!normalized) {
 showToast("Normalization failed. Adjust corners and try again.");
 return;
 }
 pages[currentPageIndex].baseCanvas = copyCanvas(normalized);
 pages[currentPageIndex].quadPoints = editState.quadPoints.map(p => ({ x: p.x, y: p.y }));
 editState = null;
 editScreen.classList.add("hidden");
 resultScreen.classList.remove("hidden");
 renderResult();
 updateThumbnailBar();
 } catch (e) {
 console.error(e);
 showToast("Edit failed: " + (e?.message || "Unknown error"));
 } finally {
 editApplyBtn.disabled = false;
 }
}

Step 8: Reorder Pages with Drag-and-Drop

👁 reorder document images

For multi-page sessions, the sort overlay lets users drag thumbnail cards into the desired order. The implementation supports both mouse (dragstart / drop) and touch (touchstart / touchmove / touchend) events. A floating clone follows the finger during a touch drag, and document.elementFromPoint identifies the drop target:

function openSort() {
 if (pages.length < 2) {
 showToast("Need at least 2 pages to reorder.");
 return;
 }
 const workingOrder = pages.map((_, i) => i);
 let dragIdx = -1;

 function rebuild() {
 sortList.innerHTML = "";
 workingOrder.forEach((pageIdx, pos) => {
 const item = document.createElement("div");
 item.className = "sort-item";
 item.draggable = true;
 item.dataset.pos = pos;

 item.addEventListener("dragstart", (e) => {
 dragIdx = pos;
 item.classList.add("dragging");
 e.dataTransfer.effectAllowed = "move";
 });
 item.addEventListener("drop", (e) => {
 e.preventDefault();
 item.classList.remove("drag-over");
 const dropPos = Number(item.dataset.pos);
 if (dragIdx >= 0 && dragIdx !== dropPos) {
 const [moved] = workingOrder.splice(dragIdx, 1);
 workingOrder.splice(dropPos, 0, moved);
 rebuild();
 }
 });

 // Touch drag support
 let touchClone = null;
 item.addEventListener("touchstart", (e) => {
 dragIdx = pos;
 const touch = e.touches[0];
 touchClone = item.cloneNode(true);
 touchClone.style.cssText = `position:fixed;left:${touch.clientX - 30}px;top:${touch.clientY - 24}px;width:${item.offsetWidth}px;opacity:0.8;z-index:200;pointer-events:none`;
 document.body.appendChild(touchClone);
 item.classList.add("dragging");
 }, { passive: true });
 item.addEventListener("touchend", (e) => {
 item.classList.remove("dragging");
 if (touchClone) { touchClone.remove(); touchClone = null; }
 const touch = e.changedTouches[0];
 const overItem = document.elementFromPoint(touch.clientX, touch.clientY)?.closest(".sort-item");
 if (overItem) {
 const dropPos = Number(overItem.dataset.pos);
 if (dragIdx >= 0 && dragIdx !== dropPos) {
 const [moved] = workingOrder.splice(dragIdx, 1);
 workingOrder.splice(dropPos, 0, moved);
 rebuild();
 }
 }
 dragIdx = -1;
 });
 sortList.appendChild(item);
 });
 }

 rebuild();
 sortOverlay.classList.remove("hidden");

 sortDoneBtn.onclick = () => {
 pages = workingOrder.map(i => pages[i]);
 currentPageIndex = 0;
 sortOverlay.classList.add("hidden");
 renderResult();
 updateThumbnailBar();
 };
}

When the user taps Done, workingOrder (an index remapping array) is applied to pages in a single .map() call, making the reorder non-destructive until confirmed.

Step 9: Export the Scanned Pages as a PDF

👁 Multi-Page Document Scanner Web App

All captured pages (with filters applied) are packed into a single multi-page PDF using jsPDF. Each page is centered and scaled to fit A4:

async function exportPdf() {
 if (!pages.length || !window.jspdf) return;
 const { jsPDF } = window.jspdf;
 const pdf = new jsPDF({ unit: "pt", format: "a4" });

 for (let i = 0; i < pages.length; i++) {
 const canvas = buildFilteredCanvas(pages[i]);
 const imageData = canvas.toDataURL("image/jpeg", 0.95);
 if (i > 0) pdf.addPage("a4", "portrait");

 const pageWidth = pdf.internal.pageSize.getWidth();
 const pageHeight = pdf.internal.pageSize.getHeight();
 const ratio = Math.min(pageWidth / canvas.width, pageHeight / canvas.height);
 const drawWidth = canvas.width * ratio;
 const drawHeight = canvas.height * ratio;
 const x = (pageWidth - drawWidth) / 2;
 const y = (pageHeight - drawHeight) / 2;

 pdf.addImage(imageData, "JPEG", x, y, drawWidth, drawHeight);
 }

 const blob = pdf.output("blob");
 const blobUrl = URL.createObjectURL(blob);
 window.open(blobUrl, "_blank", "noopener,noreferrer");
}

Individual pages can also be downloaded as PNG files via the Export as Images option, which calls canvas.toDataURL("image/png") for each page.

Source Code

https://github.com/yushulx/javascript-barcode-qr-code-scanner/tree/main/examples/multi-document-capture