VOOZH about

URL: https://thenewstack.io/use-ai-to-improve-your-organizations-metadata/

⇱ Use AI to Improve Your Organization's Metadata - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-10-31 10:00:27
Use AI to Improve Your Organization's Metadata
contributed,
AI / Data

Use AI to Improve Your Organization’s Metadata

By spending time to properly leverage metadata, you can lay the groundwork for a stronger, more relevant AI and big data analytics program.
Oct 31st, 2023 10:00am by Kumar Goswami
👁 Featued image for: Use AI to Improve Your Organization’s Metadata
Feature Image by Annette from Pixabay.

When training AI models, the accuracy of the AI app depends on the quality of the training material it receives. Naturally, feeding it more than it needs or not enough is either costly or results in a poor model, respectively. When using AI, you want your results quickly and with minimal cost. The best way to do that is to feed it just the data you need. Yet given the size of unstructured data — multiple petabytes in most enterprises — and its distribution across storage silos, it’s difficult to curate and segment specific data sets.

Enter metadata, which is data about data. Metadata is created automatically by storage technologies and offers better insights on your data, such as: who owns the data, what file type it is, where it lives, who accessed it and so on. This system-level information is extremely useful for managing data, but it lacks the additional context that users and applications often have.

Additional metadata can enhance the information such as through tagging data by its contents (clinical images showing breast cancer versus pancreatic cancer or images of celebrities or alumni), tagging sensitive information or information related to a project,  geography or demographics (research on females in the Northeast region) or related to a particular initiative (manufacturing test data from product X in 2022). Metadata brings structure to unstructured data, which can vastly aid the effort of finding the right data for use in AI tools.

Benefits of Augmenting Metadata with ML

Managing and enriching metadata is a time-consuming process that requires collaboration between IT and departments — data scientists and data owners — to tag data accurately. Tagging adds additional metadata to your file data in the form of key-value pairs, which give context to your data. One example of using multiple tags on a file is: Country = US, Project ID = 123, HIPAA = TRUE. Yet tagging across large data sets manually is virtually impossible. Machine learning-based automation will play a growing and important role in these efforts. Here’s how:

  • Machine learning algorithms can help identify and correct errors or inconsistencies in metadata, improving its overall quality.
  • Machine learning can help automatically tag and categorize data, improving its search, usability and manageability.
  • Enriched metadata delivers new possibilities for business insights from AI, such as for example, sentiment analysis of customer service interactions or discovering new causes of a common medical condition.
  • Machine learning can improve compliance, by identifying data that is not secured or stored according to regulations or analyzing data access patterns that may be in violation of corporate policies.
  • Efficiencies and cost savings from reduced manual efforts and fewer errors in managing metadata.
  • Competitive advantage from better overall use of data to make more informed decisions or even to unlock new revenue streams. The lion’s share of enterprise data is not leveraged for any purpose but hidden away in storage silos and consuming expensive storage capacity. Metadata can enhance data quality and make data more discoverable for new uses.

Enriching metadata is much more effective with a data management system that can persist that information no matter where the data lives. This way, you do not have to run the AI/ML algorithm repeatedly each time you need the additional context. The enriched metadata lives as long as the data lives. A storage-agnostic data management system can maintain an index of this metadata as your data moves from one storage system to another and provides a simple way to search, curate and extract the right data based on this enhanced metadata.

Industry Examples

Name an industry and you can imagine how metadata augmentation can deliver powerful benefits. Let’s look at the auto sector. Electric and autonomous vehicles collect large quantities of sensor data, which helps the car adjust and take actions on the fly or issue alerts to the driver. The analysis of this data is white gold for manufacturers for product enhancements and customer behavior analysis.

Using an unstructured data management system, a car manufacturer could create a workflow like this:

  • Find crash test data related to the abrupt stopping of a specific vehicle model;
  • Use an AI tool to identify and tag test data with “Reason = Abrupt Stop”.​
  • Move only the related data to a cloud service for analysis.
  • Delete the unrelated data or move it to another cloud service for archives.​
  • The process could run continuously as needed.

Here are other examples:

  1. Improving customer support: Consider a technology company that uses a machine learning program to run sentiment analysis on call center recordings. The results, such as customer satisfaction scores, are recorded to each audio file with a tag. Now employees can find relevant audio recordings for training and managers can improve best practices.
  2. Medical imaging search: A hospital could apply machine learning to medical images like MRIs, X-rays and CAT scans and then tag the images with diagnosis codes. Researchers can then find images by diagnosis to support their projects.
  3. PII detection and protection: Personal data such as HR files, patient data and financial information could be present within a small subset of the billions of files under management at an enterprise. There’s no easy way to find and isolate it continuously. But if a machine learning program like Amazon Macie could analyze data sets for PII, and then a data management system could tag them as “PII” and send them to secure, immutable storage (or delete them when possible), it saves ample time and reduces risk of a breach and fines.
  4. Image search: The marketing leader at a university wants to find images for different campaigns and delete images in its content library that might be inappropriate. The department can use an image AI program that analyzes and tags the images with relevant identifiers so that they can be easily discovered later when needed for different projects. The new metadata tags are stored in a data management system and follow the files even if they move to new storage. The same process could apply to genomics processing, for lab images.
  5. Surveillance/law enforcement: Unstructured data, such as body cam and dash cam video, along with social media posts and text messages are important pieces of evidence for criminal investigations. During a case, those files are in active use but once a case has been closed, they may be hard to find later if the case reopens or if there is a need to analyze them for new purposes — such as for crime prevention, training, or for use in research projects to improve safety.AI can analyze files and tag them as needed to support those future initiatives.
  6. Copyright protection via metadata. A hot-button topic with generative AI is that copyrighted materials such as artwork, images or books wind up in the training models of programs like ChatGPT. Lawsuits have been on the rise in the wake of this issue. One possible solution is to use tools like Digimarc that allow copyright owners to apply metadata in the form of a digital watermark to their works which can be detected by AI models before ingesting it.

Technical Considerations

A metadata augmentation project can get out of hand quickly. If you create too many new tags, you must store and manage them appropriately to avoid performance issues with user access. Most IT organizations will need to implement automation for metadata management, given the volume and variety of metadata today.

It’s best to use software that uses a combination of queries and tags. Queries deliver results for common inquiries such as: “Show me all data owned by this department that has been accessed in the last six months.” Users can create any custom queries based on the available metadata. Tags are not needed to save these queries but are used only to enhance the available metadata information using machine learning or user-driven inputs. This query plus tag approach maximizes efficiency, saves time and eliminates the issue of tag proliferation.

It’s also wise to be selective on metadata augmentation. Even with the help of machine learning tools and other systems, it takes time and resources to curate the right data for enrichment, monitor the results for accuracy, safeguard the data from misuse and work with data stakeholders to ensure that more metadata is serving their needs rather than making an AI project more complex or producing false or inaccurate findings. Yet by spending time and using the right tools and resources to understand and properly leverage metadata, IT leaders and data stakeholders can lay the groundwork for a stronger, more relevant AI and big data analytics program.

TRENDING STORIES
Kumar Goswami is the CEO of Komprise. He has spent 23+ years delivering products that solve complex IT problems with simplicity and cost efficiency.
Read more from Kumar Goswami
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.