VOOZH about

URL: https://alexop.dev/posts/mutation-testing-ai-agents-vitest-browser-mode/

⇱ Mutation Testing with AI Agents When Stryker Doesn't Work | alexop.dev


Next Talk: Automating Web Development with Claude Code

July 1, 2026 — DWX Developer World, Mannheim

Conference

Mutation Testing with AI Agents When Stryker Doesn't Work

Published: at 

The Coverage Lie#

Code coverage lies. A test that exercises a line doesn’t mean it verifies that line does the right thing:

function add(a: number, b: number): number {
 return a + b
}

// 100% coverage - would still pass if add() returned 999
it('adds numbers', () => {
 add(2, 2)
})

Mutation testing flips the question. Instead of asking “did tests run this code?”, it asks “if I break this code, do tests fail?”

Using our add example, a mutation tester would:

// Original
function add(a: number, b: number): number {
 return a + b
}

// Mutated: swap + for -
function add(a: number, b: number): number {
 return a - b // <-- bug introduced
}

Now run the test. add(2, 2) returns 0 instead of 4. Does the test fail? No—it never checked the result. The mutant survives. Your test has a gap.

The process:

  1. Mutate: Introduce a small bug (change > to >=, swap && for ||, delete a line)
  2. Run tests: Execute your test suite against the mutated code
  3. Evaluate: If tests pass with the bug, your tests are weak. If tests fail, they caught it.

A mutation that tests fail to catch is a “surviving mutant”—proof of a test gap.


When Stryker Works: The Gold Standard#

When your test stack supports it, automated mutation testing with Stryker is the way to go. It’s fast, deterministic, generates HTML reports, and runs in CI pipelines. This is especially valuable when you have pure functions with high test coverage but want to verify test quality.

Here’s what it looks like in practice:

pnpm test:mutation
# or: stryker run
INFO ProjectReader Found 7 of 2947 file(s) to be mutated.
INFO Instrumenter Instrumented 7 source file(s) with 394 mutant(s)
INFO DryRunExecutor Initial test run succeeded. Ran 184 tests in 0 seconds.

Mutation testing [====================] 100% | 394/394 Mutants tested
(35 survived, 0 timed out)

--------------|---------|----------|----------|----------|
File | % score | # killed | # survived | # no cov |
--------------|---------|----------|----------|----------|
All files | 90.86 | 358 | 35 | 1 |
 backlinks.ts | 96.30 | 26 | 1 | 0 |
 callouts.ts | 93.94 | 62 | 4 | 0 |
 graph.ts | 91.55 | 65 | 6 | 0 |
 mentions.ts | 91.30 | 63 | 5 | 1 |
 minimark.ts | 82.61 | 76 | 16 | 0 |
 text.ts | 100.00 | 34 | 0 | 0 |
 wikilinks.ts | 91.43 | 32 | 3 | 0 |
--------------|---------|----------|----------|----------|

INFO MutationTestExecutor Done in 36 seconds.

394 mutants tested across 7 files in 36 seconds. The report shows exactly which files have weak spots—minimark.ts at 82.61% needs attention, while text.ts is solid at 100%.

Stryker also generates an interactive HTML report where you can drill into each surviving mutant and see exactly what code change your tests failed to catch.

💪Use Stryker When You Can

If your stack supports Stryker (standard Vitest in Node mode, Jest, Mocha), use it. Deterministic tooling in your CI pipeline beats manual approaches every time. The AI agent technique in this post is for when Stryker isn’t an option.


The Vitest Browser Mode Problem#

But what if Stryker doesn’t support your stack? Stryker doesn’t work with Vitest’s browser mode. Their instrumentation assumes Node.js execution, but browser mode runs tests in actual Chromium via Playwright.

My setup:

  • Framework: Vitest 4 with browser.enabled: true
  • Provider: Playwright (Chromium)
  • Test style: Integration tests with real DOM

My testing strategyVue 3 Testing Pyramid: A Practical Guide with Vitest Browser ModeLearn a practical testing strategy for Vue 3 applications using composable unit tests, Vitest browser mode integration tests, and visual regression testing.vuetestingvitest+2 relies heavily on Vitest browser mode for realistic user flow testing. Stryker’s mutation coverage reports? Not an option. And switching to Node-based testing would mean losing the browser-specific behavior I’m actually testing.


AI Agents as Manual Mutation Testers#

The mutation testing algorithm is simple enough that an AI coding agent can execute it manually. Claude Code can:

  1. Read your source code
  2. Apply mutations systematically
  3. Run pnpm test --run
  4. Record whether tests passed or failed
  5. Restore the original code
  6. Report surviving mutants with suggested fixes

I adapted a Claude Code skillClaude Code Customization Guide (2026): CLAUDE.md vs Skills vs SubagentsWhen should you use CLAUDE.md, a slash command, a skill, or a subagent in Claude Code? A decision guide with real examples for each, so you stop guessing which one fits the job.claude-codeaitooling+1 originally created by Paul Hammond that codifies this workflow.

The Mutation Testing Skill#

The skill defines mutation operators in priority order:

Priority 1 - Boundaries (most likely to survive):

OriginalMutate To
<<=
>>=
<=<
>=>

Priority 2 - Boolean Logic:

OriginalMutate To
&&||
||&&
!conditioncondition

Priority 3 - Return Values:

OriginalMutate To
return xreturn null
return truereturn false
Early returnRemove it

Priority 4 - Statement Removal:

OriginalMutate To
array.push(x)Remove
await save(x)Remove
emit('event')Remove

The agent applies each mutation one at a time, runs tests, records results, and restores the original code immediately.


Real Example: Settings Feature#

I ran this against my settings feature. The integration tests looked comprehensive—theme toggling, language switching, unit preferences. Code coverage would show high percentages.

Results: 38% mutation score (5 killed, 8 survived out of 13 mutations)

Here’s what the AI agent found:

Surviving Mutant #1: Volume Boundary Not Tested#

// Original (stores/settings.ts:65)
Math.min(Math.max(volume, 0.5), 1)

// Mutation: Change 0.5 to 0.4
Math.min(Math.max(volume, 0.4), 1)

// Result: Tests PASSED -> Mutant SURVIVED

My tests never verified the minimum volume constraint. A bug changing the minimum from 50% to 40% would ship undetected.

Surviving Mutant #2: Theme DOM Class Not Verified#

// Original (composables/useTheme.ts:26)
newMode === 'dark'

// Mutation: Negate the condition
newMode !== 'dark'

// Result: Tests PASSED -> Mutant SURVIVED

My test checked that clicking the toggle changed the stored preference. It never verified that document.documentElement.classList actually received the dark class. The UI could break while tests pass.

Surviving Mutant #3: Error Handling Path Untested#

// Original (stores/settings.ts:28)
if (error) return

// Mutation: Negate the condition
if (!error) return

// Result: Tests PASSED -> Mutant SURVIVED

No test exercised the error handling branch. A bug that inverted error handling would go unnoticed.

The Fixes#

The agent suggested specific tests for each surviving mutant:

// Fix for Mutant #1: Boundary test
it('volume slider has minimum value constraint of 50%', async () => {
 const volumeSlider = page.getByTestId('timer-sound-volume-slider')
 await expect.poll(async () => {
 const el = await volumeSlider.element()
 return el.getAttribute('min')
 }).toBe('0.5')
})

// Fix for Mutant #2: DOM verification
it('adds dark class to html element when dark mode enabled', async () => {
 const themeToggle = page.getByTestId('theme-toggle')
 await userEvent.click(themeToggle)

 await expect.poll(() =>
 document.documentElement.classList.contains('dark')
 ).toBe(true)
})

How to Set This Up#

Step 1: Create the Skill#

Save this as .claude/skills/mutation-testing/SKILL.md:

Step 2: Invoke It#

claude "Run mutation testing on the settings feature"

The agent will:

  • Find changed files on your branch
  • Identify testable functions
  • Apply mutations systematically
  • Report surviving mutants with suggested test fixes

Step 3: Review and Fix#

The agent produces a markdown report. Review each surviving mutant and decide:

  • Add the suggested test
  • Accept the risk (document why)
  • Refactor the code to be more testable

When to Use This Approach#

Good FitNot Ideal
Vitest browser mode (no Stryker support)Large codebases needing full mutation coverage
Playwright component testingCI/CD automation (manual agent invocation)
Small-to-medium codebasesStrict mutation score thresholds
Pre-merge review of specific features
Learning what makes tests effective

💪Complement, Don't Replace

This approach works best alongside your existing testing strategy. Use it to spot-check critical features before merge, not as a replacement for automated mutation testing where available.

Feature Branches, Not Pipelines

This skill shines on feature branches where you want to validate test quality before merging. Running AI agents in CI/CD pipelines is possible—you could build an automated QA agentBuilding an AI QA Engineer with Claude Code and Playwright MCPLearn how to build an automated QA engineer using Claude Code and Playwright MCP that tests your web app like a real user, runs on every pull request, and writes detailed bug reports.aitestingclaude-code+1 with the Claude Agent SDK—but it adds complexity and cost. For pipeline automation, deterministic tools like Stryker remain the better choice when your stack supports them. Think of this as a developer tool for improving tests during development, not a CI gate.


Key Takeaways#

  1. Coverage doesn’t equal confidence. High code coverage can coexist with ineffective tests.

  2. Mutation testing reveals test gaps. By breaking code and checking if tests notice, you find what’s actually being verified.

  3. AI agents can execute manual mutation testing. When tooling doesn’t support your stack, an agent can apply the algorithm systematically.

  4. Focus on surviving mutants. Each one is a potential bug your tests wouldn’t catch.

  5. This complements, not replaces. Use this alongside coverage reports, not instead of automated mutation testing where available.


Resources#

Stay Updated!

Subscribe to my newsletter for more TypeScript, Vue, and web dev insights directly in your inbox.

  • Background information about the articles
  • Weekly Summary of all the interesting blog posts that I read
  • Small tips and trick
Subscribe Now
Share this post on:
Share this post via WhatsAppShare this post on FacebookTweet this postShare this post via TelegramShare this post on PinterestShare this post via emailShare this post on LinkedIn