VOOZH about

URL: https://thenewstack.io/using-llms-right-leveraging-ai-for-augmented-data-quality/

⇱ Using LLMs Right: Leveraging AI for Augmented Data Quality - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-05-12 10:00:28
Using LLMs Right: Leveraging AI for Augmented Data Quality
contributed,
AI Engineering / Data / Large Language Models

Using LLMs Right: Leveraging AI for Augmented Data Quality

LLMs possess immense potential to revolutionize data quality practices, but only when used precisely and intentionally.
May 12th, 2025 10:00am by Marek Ovcacek
👁 Featued image for: Using LLMs Right: Leveraging AI for Augmented Data Quality
Photo by Mohammad Rahmani on Unsplash.

Data quality has evolved from a “nice-to-have” to an essential and, in some cases, mission-critical part of a data operation. When AI started to gain traction, many data governance and data science leads saw the writing on the wall. The playing field is leveling, and those who can properly capitalize on their data will win with new use cases. Companies that underinvest in data quality are struggling to keep up.

Initially, AI was seen as a silver bullet, promising to automate complex processes and handle large volumes of data effortlessly. However, it’s become clear that without a strategic framework, AI can be as problematic as it is beneficial.

Briefly, it seemed like we could overcome the classic problem of ‘garbage in, garbage out’ by scale. However, the recent slowdown in the biggest models shows there are limits. The volume of training data is essential, but data quality is quickly becoming a differentiator.

Data Quality in the Modern Data Stack

Organizations have been running data quality programs for decades. Guides, best practices, and experts will tell you how to write data quality rules, deploy them, and prioritize different parts of your data stack. So, it should be easy, right?

Unfortunately, the modern data landscape is anything but simple. What worked ten years ago, when every company had one central enterprise data warehouse or database with a few interconnected systems, doesn’t scale anymore.

Instead of dealing with a couple of systems and data formats, businesses must deal with data landscapes comprising hundreds or thousands of different data systems. While technologies to increase processing to a petabyte scale are available (if you have enough money), the recent push for data democratization means data complexity is about to explode.

This is a widely recognized problem. Gartner even reclassified its Magic Quadrant for Data Quality Solutions as Augmented Data Quality Solutions, adding a focus on automation and scale.

The writing is on the wall. To win the AI race, you must rethink how to approach data quality.

AI Will Fix… AI?

In the old days, to manage growing complexity, you hired more people to take care of more systems. That approach worked for years and led to whole departments of data engineers cranking out data quality rules, configurations, and reports. This approach doesn’t scale anymore, but fortunately, the modern LLM revolution has arrived at the right time.

Consider the following use case. ChatGPT has been used to fix data quality issues in a database of user interactions from various apps to prepare it for an AI model that will provide personalized recommendations. It shows how we can use AI to fix data for AI.

👁 Image

👁 Image

In this example, we just need to run this request every day as new data comes in and run it on our other systems as well, covering various data sources and various data volumes…

You can probably see the problem here already, and it’s not isolated.

Consider the reliance on LLMs to automate complex data cleansing without proper oversight, which often leads to errors and inconsistencies that only manifest at scale. Even the example above works well on specific use cases but will struggle with scale, consistency and hallucinations.

AI Will Fix AI… but It’s Not That Simple

AI can’t scale using only its raw power in its current form. You have to be intentional about where and how to use it. Running LLM-based AI processing on thousands of sources and petabytes of data isn’t feasible. However, we know how to address growing data volumes as there are big data approaches that can run at almost any scale; they just need to be configured correctly. So, what if we ask ChatGPT a different question?

In this scenario, the model was asked to write data quality rules. Instead of a full data set, it only received a sample and was asked to propose rules to be run on top of the data.

👁 Image

👁 Image

The rules the model proposed are examples of logic that can be used repeatably with predictable outcomes. The rules scale to any data size and source if implemented in suitable processing technology.

This prompt will only need to be run when the data source or profile changes significantly. You can take this list of newly generated data quality rules and use it as input for rule mapping, applying it to a different dataset without needing to re-generate rule logic. This scales far better than asking AI to detect issues in your data.

Of course, there are caveats. The problems with predictability and hallucinations are not eliminated, and scaling it for large data landscapes also creates some orchestration challenges.

Use AI Intentionally

The correct application of LLMs for data quality is a strategic imperative. LLMs possess immense potential to revolutionize data quality practices, but only when used precisely and intentionally.

Organizations must engage with AI as part of a broader, well-thought-out data governance strategy. The successful integration of LLMs into data quality processes requires a clear understanding of both the capabilities of these models and the unique challenges of your data landscape.

As we look to the future, the question remains: How will you adapt your data quality strategies to responsibly and effectively leverage AI’s full potential?

TRENDING STORIES
Marek Ovcacek, Field CTO at Ataccama, has nearly two decades of experience in the technology industry. With deep expertise in data quality, master data management and data governance, he works closely with strategic customers to help them extract value from...
Read more from Marek Ovcacek
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.