VOOZH about

URL: https://dev.to/aulvem/catching-content-rule-violations-at-build-time-with-astro-content-collections-zod-3jp1

⇱ Catching content rule violations at build time with Astro Content Collections + Zod - DEV Community


If you run a Markdown-based blog long enough, the frontmatter starts accumulating rules. "A reviews post must carry an ad disclosure." "FAQ questions in JSON-LD must also appear in the body." Eventually a README check-list isn't enough — you forget.

Astro Content Collections plus Zod lets you push most of those rules into build failures. .refine() couples two fields, nested z.object types your structured data, and a violation gets caught at astro build.

This post is the code-first version of the setup I use on aulvem.com. The longer version with operational notes is linked at the end.

Minimal setup: defineCollection + z.object

// src/content.config.ts
import { defineCollection, z } from "astro:content";
import { glob } from "astro/loaders";

const blog = defineCollection({
 loader: glob({
 pattern: "**/[^_]*.{md,mdx}",
 base: "./src/content/blog",
 }),
 schema: z.object({
 title: z.string(),
 description: z.string(),
 pubDate: z.coerce.date(),
 category: z.enum(["build", "reviews"]),
 tags: z.array(z.string()).default([]),
 draft: z.boolean().default(false),
 affiliate: z.boolean().default(false),
 }),
});

export const collections = { blog };

Four moves cover most of the surface:

  • z.enum pins the category to a fixed set — typos break the build
  • z.coerce.date reads 2026-05-23 as a Date
  • .default(false) makes the field omissible at the YAML side
  • z.array(z.string()) and other composites work as-is

This is straight out of the Astro 5 docs. The interesting work starts with .refine().

.refine() for "two fields must move together"

When two fields are coupled — change one, the other must follow — .refine() at the end of the schema is the right shape. Aulvem's case: category: reviews posts must have affiliate: true so the disclosure banner and rel="sponsored" injection both kick in.

const blog = defineCollection({
 loader: glob({ /* ... */ }),
 schema: z
 .object({
 title: z.string(),
 category: z.enum(["build", "reviews"]),
 affiliate: z.boolean().default(false),
 // ...
 })
 .refine((data) => (data.category === "reviews") === data.affiliate, {
 message: "affiliate must be true iff category is 'reviews'",
 path: ["affiliate"],
 }),
});

(A === B) === affiliate reads as "these two are always equal" — same logic as XOR, easier to scan months later.

Build error from a reviews post that forgot affiliate: true:

[ContentEntryInvalidError] Content config error in `blog → 2026-05-...`:
affiliate must be true iff category is 'reviews'
 at affiliate

message lands in the output verbatim, so it's worth writing it as instructions for future-you.

.refine vs .superRefine

When you need more than one independent constraint on an object — or per-field error messages — .superRefine is easier:

.superRefine((data, ctx) => {
 if (data.category === "reviews" && !data.affiliate) {
 ctx.addIssue({
 code: z.ZodIssueCode.custom,
 message: "reviews posts must set affiliate: true",
 path: ["affiliate"],
 });
 }
 if (data.draft && data.updatedDate) {
 ctx.addIssue({
 code: z.ZodIssueCode.custom,
 message: "draft posts should not carry updatedDate",
 path: ["updatedDate"],
 });
 }
})

For a single relationship between two fields, .refine() stays lighter.

Typed structured data in frontmatter

HowTo and FAQPage JSON-LD blocks pull their data from frontmatter rather than from parsed body text. The reasons:

  • Frontmatter is what Zod validates, so the shape is enforced for free
  • A heading rename doesn't quietly break JSON-LD
  • The JSON-LD generator can trust frontmatter without re-parsing MDX

Schema:

howto: z
 .object({
 name: z.string().optional(),
 description: z.string().optional(),
 totalTime: z.string().optional(),
 steps: z.array(
 z.object({
 name: z.string(),
 text: z.string(),
 image: z.string().optional(),
 }),
 ),
 })
 .optional(),
faq: z
 .array(
 z.object({
 question: z.string(),
 answer: z.string(),
 }),
 )
 .optional(),

YAML side:

---
title: "AstroContentCollectionstips"
faq:
 - question: "Whendoyoureachfor.superRefineover.refine?"
 answer: "Whenoneobjectneedsmorethanoneindependentconstraint..."
 - question: "Whatbreakswhentheschemachanges?"
 answer: "Everyexistingpostbydesign..."
---

A howto with zero steps, or a faq entry missing answer, fails the build.

What Zod can't reach

Zod only inspects frontmatter — the body MDX is outside its scope.

Google's quality guidelines flag JSON-LD without body counterparts as structured-data mismatch and pull the rich-result eligibility. A post with frontmatter FAQs that never appear in the body passes the schema and silently disqualifies itself.

The fix is a separate layer. A small grep-based validator covers it:

import { readFile } from "node:fs/promises";
import { parse as parseYaml } from "yaml";

const raw = await readFile(path, "utf8");
const m = /^---\r?\n([\s\S]*?)\r?\n---\r?\n([\s\S]*)$/.exec(raw);
if (!m) process.exit(0);

const data = parseYaml(m[1]);
const body = m[2].replace(/\s+/g, "").toLowerCase();

const mismatches = [];
if (Array.isArray(data.faq)) {
 for (const [i, q] of data.faq.entries()) {
 const needle = q.question.replace(/\s+/g, "").toLowerCase();
 if (!body.includes(needle)) {
 mismatches.push(`faq[${i}].question not in body: "${q.question}"`);
 }
 }
}

if (mismatches.length) {
 for (const e of mismatches) console.error(e);
 process.exit(1);
}

It's substring presence only. The script doesn't catch a wrong answer under the right question — that's a review-time concern.

The three-layer split

Once you split rules across three layers, "where should this rule live?" becomes answerable:

Layer Fires at Catches Misses
Zod schema astro build types, enums, required/optional, field relations meaning, body parity
Lint script pre-commit, CI banned phrases, substring parity meaning
Review pre-publish meaning, judgment calls not automatable

Rule of thumb: if a higher layer can catch it, don't push it down.


The full operational notes — the failure modes I keep an eye on, the disclosure-strength judgments, the decision history of why some rules stay in review — live on Aulvem → Pushing operational rules into Astro Content Collections with Zod