VOOZH about

URL: https://thenewstack.io/ai-code-generation-trust-and-verify-always/

⇱ AI Code Generation: Trust and Verify, Always - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-08-22 07:00:20
AI Code Generation: Trust and Verify, Always
sponsor-sonarsource,sponsored-post-contributed,
AI / Large Language Models

AI Code Generation: Trust and Verify, Always

Vibe coding must be followed by a rigorous "verify" step to manage the significant security blockers and technical debt coding assistants can generate.
Aug 22nd, 2025 7:00am by Prasenjit A. Sarkar
👁 Featued image for: AI Code Generation: Trust and Verify, Always
Image from KT Stock photos on Shutterstock.
Sonar sponsored this post. Insight Partners is an investor in Sonar and TNS.

We are at the dawn of a new era in software development. Artificial intelligence is no longer just a tool; it’s becoming a genuine collaborator in the creative process of writing code. This shift promises to unlock unprecedented productivity and innovation. However, like any powerful new tool, this AI collaborator requires a new management philosophy. To truly harness its potential without inheriting its flaws, we must adopt a rigorous principle: trust and verify.

This isn’t about stifling innovation. It’s about enabling it responsibly. As we integrate AI more deeply into the software development life cycle, we must look past the impressive benchmark scores and directly assess the security, reliability and maintainability of the code it produces.

Beyond ‘Does It Work?’

The immediate appeal of large language models (LLMs) is their stunning ability to generate functionally correct code. Top-tier models can solve complex algorithmic problems and produce syntactically valid code with high success rates. This proficiency is driving their rapid adoption. But the critical question for any professional development team isn’t just “Does it work?” It’s “Is it production-ready?”

This is where enthusiasm must be tempered with caution. While LLMs are excellent at solving contained problems, they often lack a grasp of the bigger picture, leading to significant hidden risks.

One of the most pressing concerns is security. In fact, new Sonar research that analyzes AI code generated from prominent models of providers like OpenAI, Anthropic and Meta shows that today’s LLMs have a profound blind spot in this area. For instance, for leading LLMs like GPT-4o and Llama 3.2 90B, we found that a staggering 60 to 70% of the security vulnerabilities they introduce are of ‘BLOCKER’ severity (the highest possible rating). This isn’t a matter of occasional errors, but a structural weakness rooted in their foundational design and training.

Just as critical is the long-term health of the codebase. AI models have an inherent bias toward producing “messy” code that creates technical debt. Our research also showed that, across all the models evaluated, code smells constitute over 90% of all issues found. While the code may function today, this accumulation of structural issues will inevitably lead to a codebase that is difficult and costly to maintain tomorrow.

The Myth of a Monolithic AI

It’s a mistake to think of “AI” as a single entity. Just as every human developer has a unique style, different LLMs possess distinct “coding personalities.” Understanding these nuances is key to using them effectively.

For example, our analysis identified clear archetypes. One model, the “senior architect” (Claude Sonnet 4), writes verbose, complex, enterprise-grade code. But this sophistication comes at a price: a high tendency for introducing difficult-to-diagnose bugs like resource management leaks and concurrency issues. In contrast, the “rapid prototyper” (OpenCoder-8B) is incredibly concise, getting a functional result with minimal code. The trade-off? It contributes a fair amount to technical debt, exhibiting the highest issue density of any model we tested and burying projects in long-term maintainability problems.

Choosing a model isn’t just about picking the one with the highest benchmark score. It’s about understanding its inherent style and compensating for its specific weaknesses.

The Paradox of Progress: Smarter Can Mean Riskier

Perhaps the most crucial insight for any leader in this space is a counterintuitive paradox: As models become more capable, they can also become more reckless. The very ambition that allows a newer model to solve more complex problems can lead it to create more severe failures.

We saw this clearly when comparing a model with its direct successor. While the newer model’s benchmark performance improved by 6.3%, it also increased high-severity bugs by 93%. This single data point is a powerful argument against relying on performance scores alone. A model that appears “better” on paper may be introducing a greater level of risk into your applications.

A New Mandate for Intelligent Oversight

The future of software development is one of human-AI collaboration. To make this partnership successful, we must embrace a “trust and verify” approach. This means implementing a consistent process for reviewing and analyzing every piece of code, regardless of its origin. It dictates that robust governance for security, reliability and maintainability is not a suggestion, but a requirement.

This is especially true in the age of “vibe coding,” where the goal is to get a functional prototype quickly. Our research shows that this initial “vibe” must be followed by a rigorous “verify” step to manage the significant security blockers and technical debt these models can generate. This verification isn’t a bottleneck; it’s the process that transforms a promising prototype into production-ready software.

By expanding our view beyond performance and committing to this deeper level of verification, we can harness the incredible power of AI responsibly. This is how we will build the next generation of software. Not just faster, but better, safer and more resilient.

Sonar is the industry standard for code verification and automated code review, trusted by 75% of the Fortune 100. Its SonarQube platform analyzes over 750 billion lines of code daily, helping to prevent outages, reduce risk, lower technical debt, and ensure compliance.
Learn More
The latest from Sonar
Hear more from our sponsor
TRENDING STORIES
Prasenjit A. Sarkar is product and solutions marketing manager at Sonar. With over 20 years of experience in the technology industry, he is a seasoned technology and product leader who is passionate about building and scaling innovative AI products. He...
Read more from Prasenjit A. Sarkar
Sonar sponsored this post. Insight Partners is an investor in Sonar and TNS.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI, Anthropic, Sonar.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.