VOOZH about

URL: https://thenewstack.io/the-llm-flywheel-effect-ai-that-writes-and-tests-documentation/

⇱ The LLM Flywheel Effect: AI That Writes and Tests Documentation - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-11-10 11:00:57
The LLM Flywheel Effect: AI That Writes and Tests Documentation
sponsor-tabnine,sponsored-topic,tutorial,
AI Agents / Large Language Models / Model Context Protocol (MCP) / Software Development

The LLM Flywheel Effect: AI That Writes and Tests Documentation

How to manage a team of AI assistants in a virtuous cycle of improvement. The LLM flywheel effect is a new workflow for developers to adopt.
Nov 10th, 2025 11:00am by Jon Udell
👁 Featued image for: The LLM Flywheel Effect: AI That Writes and Tests Documentation
Image via Unsplash+.

To help a team member get up to speed on a project, I had to learn and then document how to set up a Mac environment with both Node.js and the .NET runtime. I had never used .NET on a Mac, so the first customer for this piece of documentation was me.

Naturally, I tapped my team of AI assistants who collectively hold a lot of knowledge about the topic. They wrote instructions, I followed along and reported problems, and we iterated toward the solution.

Then the penny dropped: These AI assistants can not only help write the instructions, but they can also read them and help me reproduce them. I’ve decided to call this the flywheel effect. It’s not automatic; I’ve yet to have the kind of hands-off experience that others report with AI, but that’s not my goal. I don’t want to be out of the loop; I want to be in it efficiently: Start the flywheel spinning, then tap it strategically to build momentum.

The Role of an MCP Server in the AI Workflow

A key enabler for this scenario was a filesystem MCP server that enables agents like Claude and Cursor to read and write files. Anthropic’s reference implementation granted the access required to read and write the evolving document. It did not grant access to run the necessary system commands, so I was firmly in the loop: Copy/paste the commands they suggested, run them, copy/paste the output, and discuss next steps.

I don’t want to be out of the loop, I want to be in it efficiently: Start the flyweel spinning, then tap it strategically to build momentum.

This worked beautifully, modulo the ongoing struggle to manage MCP configuration across a team of assistants. Each has its own configuration file, and although the MCP protocol itself is standard, the locations and formats of these config files are not.

In How LLMs Guide Us to a Happy Path for Configuration and Coding, I observed that configuration is the new hard problem — one that eclipses cache invalidation, naming, and off-by-one errors. You can enlist AI assistants to debug their own configurations, but I wish people who run our own MCP server didn’t have to; it’s a buzzkill. Is there a better way to handle this? If so, please let me know, I’m all ears.

You can also do this kind of thing in a more direct way using Claude Code or Codex. To test that approach, I nuked the installation and asked Claude Code to read the instructions, follow the steps, run all the necessary commands with my permission, evaluate outputs, and produce a final report. Everything got installed, the backend server started, and the frontend app ran successfully. Here’s the report.

We’ve long imagined documentation as a first-class software engineering discipline, but it hasn’t been clear exactly what that would mean. Now the picture is coming into focus. AI assistants can help us not just create documentation, but also test it — just as we test our code. If you’ve ever struggled to write reproducible docs, or been frustrated by installation instructions that don’t work as described, you’ll appreciate the power of this flywheel effect.

Iterating on an MCP Server With AI Feedback

When I used Claude to help build the first version of the XMLUI MCP server, I was amazed to find that since Claude was also a client of that server, I could ask it to reflect on the responses it got from the tools provided by the MCP server and then adjust the server code to improve those responses. A major priority was to anchor agents to ground truth, so we arranged for all responses to include dire warnings: invent no syntax, use and recommend only techniques backed by docs that include working examples, always cite the URLs of those docs.

An agent-to-agent architecture may lie in the future.

With that guidance, coding agents behave better than they did before, but they often still ignore the guidance and require interactive reminders to follow it. Yelling louder at them won’t help. The root of the problem is that an MCP server has no independent agency. It can try to influence how an agent selects and uses its tools, but ultimately has no control over that selection and use. My hunch is that the MCP server needs to become more agent-like, so it can work with coding assistants’ agents on a more equal footing. An agent-to-agent architecture may lie in the future.

Meanwhile, when I recruited my team of assistants to evaluate the MCP server, they all agreed that the verbose guidance was wasting tokens. Sample feedback: “The MANDATORY” guidance block is repetitive and uses ~2k tokens per failure.” A related issue was that search results could be too verbose and were also chewing into the token budget.

I picked Kiro to streamline the responses, then I rebuilt the server and asked it to re-evaluate the sample queries that had produced unwieldy responses. Here’s what it saw.

-Failure messages are now concise: Only three bullet points instead of 20+ “MANDATORY” warnings (saves ~1.5k tokens per failure)

-Result truncation working: Successful searches show 20 results with “… X more results omitted …” message (cleaner than the old “use JSON for full list”)

– Snippet truncation: Long lines are truncated at 200 chars with “…”

– All tools affected: The improvements apply to xmlui_search, xmlui_search_howto, and xmlui_examples

Because this is a team effort, I then asked Cursor to do the same evaluation. Here’s what it noticed:

Issue: When there are no results, the code always includes base rule reminders that aren’t useful:

“Cite sources with file paths and URLs” (no sources to cite)

“Provide URLs from documentation_urls when available” (no URLs available)

These reminders are useful for successful searches, not failures.

Good point! I relayed the feedback to Kiro, it made the implied changes, and I verified them with a round of feedback from the whole team.

The Human Dev’s Role in the Virtuous Cycle

While I suspect that the nascent agent-to-agent protocol will enable this kind of thing to happen more autonomously, I’m happy to be the coordinator and I don’t think I’d ever want to fully abandon that role.

I’m reminded of the old adage about building a plane while you are flying it. In this case, weirdly and remarkably, the pilot who senses problems is also the mechanic who fixes them. Who am I in this scenario? To torture the metaphor, I guess I am the manager of the airline who sets goals, builds teams, starts the flywheel spinning, and taps it at the right times and in the right ways to accelerate a virtuous cycle of improvement.

Our goal at Tabnine is to create and deliver a top-to-bottom AI-assisted development workflow that empowers all code creators, in all languages, from concept through to completion.
Learn More
The latest from Tabnine
TRENDING STORIES
Jon Udell is an author and software developer who explores software tools and technologies and explains them in writing, audio, and video. He is the author of the cult classic Practical Internet Groupware. Past gigs include Lotus, BYTE magazine, Safari...
Read more from Jon Udell
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Anthropic.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.