VOOZH about

URL: https://thenewstack.io/what-is-unstructured-data/

⇱ What Is Unstructured Data? - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-05-22 09:35:23
What Is Unstructured Data?
sponsor-zilliz,sponsored-post-contributed,
AI / Data / Software Development

What Is Unstructured Data?

A look at the intricacies of unstructured data and methods for processing, analyzing and querying it.
May 22nd, 2023 9:35am by Frank Liu
👁 Featued image for: What Is Unstructured Data?
Zilliz sponsored this post.
This is the first of a three-part series. Our world is constantly evolving digitally, with data growing exponentially every second. The rise of AI technology has only accelerated this process. However, not all data is created equal. An astonishing 80% of newly generated data is unstructured. This proportion is expected to increase as industries advance and technology develops. Most importantly, unstructured data is abundant in quantity and a valuable source of rich information that can provide helpful insights for informed business decisions. So, what exactly is unstructured data, and how does it differ from structured and semi-structured data? How can we effectively process, analyze, and search through unstructured data? In this blog, we will explore the intricacies of unstructured data and discuss methods for processing, analyzing and querying it.
Zilliz is a leading vector database company, offering high-performing and scalable solutions. We’re powered by Milvus, the popular open-source vector database that helps companies from any scale build AI-powered search solutions.
Learn More

Structured Data vs. Unstructured Data vs. Semi-Structured Data

Let’s start by learning about different data types — structured, semi-structured and unstructured.

Structured Data

Structured data follows a specific format, making it easy to store and analyze using traditional data management tools like SQL. Examples of structured data include customer information, transaction records and inventory lists.

Semi-Structured Data

Semi-structured or partially structured data is a mixture of structured and unstructured data. It contains some level of organization, such as metadata or tags, but is not fully structured. Semi-structured data is commonly found in XML files, JSON documents and other data types that follow a specific schema. This type of data is usually stored in a NoSQL database like a wide-column store or object/document database since it cannot be directly stored in a relational database.

Unstructured Data

Unstructured data refers to data that does not have a specific format or structure. This data type is often created by humans in forms such as text, images, videos, emails and social media posts. However, unstructured data can also include less common examples like protein structures, executable file hashes and human-readable code, among others — the possibilities are endless. Below are some specific examples of unstructured data, both machine-generated and human-generated.
  • Sensor data: Data collected from various sensors, including temperature, humidity, GPS and motion sensors.
  • Machine log data: Data generated by machines, devices or applications, including system logs, application logs and event logs.
  • Internet of Things (IoT) data: Data collected from smart devices, including smart thermostats, home assistants and wearable devices.
  • Computer vision data: Data generated by computer vision technologies such as image recognition, object detection and video analysis.
  • Natural Language Processing (NLP) data: Data generated by NLP technologies, such as speech recognition, language translation and sentiment analysis.
  • Web and application data: Data generated by web servers, web applications and mobile applications, including user behavior data, error logs and application performance data.
  • Emails: Email messages typically contain unstructured text, images and attachments.
  • Text messages: Text messages can be informal, unstructured and contain abbreviations or emojis.
  • Social media posts: Social media posts can vary in structure and content, including text, images, videos and hashtags.
  • Audio recordings: Human-generated audio recordings can include phone calls, voicemails, audio files and audio notes. They are considered unstructured data.
  • Handwritten notes: Handwritten notes can be unstructured and may contain drawings, diagrams and other visual elements.
  • Meeting notes: Meeting notes can contain unstructured text, diagrams and action items.
  • Transcripts: Transcripts of speeches, interviews and meetings can contain unstructured text with varying degrees of accuracy.
  • User-generated content: User-generated content on websites and forums can be unstructured data, including free-form text, images and video files.

Analyzing Unstructured Data Is Challenging

Working with unstructured data can be challenging due to its lack of a standardized format. In addition, things become more complicated when it comes to querying and analyzing data, especially when compared to structured and semi-structured data. Finding or filtering specific items in a database is simple when dealing with structured or semi-structured data. For instance, to retrieve the first book from a particular author in MongoDB, you can use the following code snippet (with the help of `pymongo`).
>>> document = collection.find_one({'Author': 'Bill Bryson'})
This query methodology is similar to traditional relational databases, which filter and retrieve data through SQL statements. The basic idea is the same: databases built for structured or semi-structured data perform filtering and querying using mathematical (such as <=, string distance) or logical (EQUALS, NOT) operators across numerical values and strings. For traditional relational databases, this is called relational algebra. That’s why they always return exact matches for a given set of filters. However, traditional relational databases and data management tools cannot handle the complexities of unstructured data analysis. For instance, if a user wants to find similar shoes based on a collection of shoe pictures taken from different angles, a relational database would be unable to comprehend the nuances of shoe style, size, color, etc., based solely on the raw pixel values of those images. It poses a significant challenge for industries and companies that use unstructured data: How can we transform, store and similarly search unstructured data for structured/semi-structured data?

How to Search and Analyze Unstructured Data

To address the challenge of analyzing and searching unstructured data, specialized software and techniques such as machine learning or, more specifically deep learning, are used. Machine learning is an artificial intelligence method that allows computers to learn from unstructured data without being explicitly programmed. Most machine learning models convert a single piece of unstructured data into a list of floating-point values, also known more commonly as embeddings or embedding vectors, before the data is searched and analyzed for insights.
👁 Image

How machine learning models process unstructured data

For example, the preeminent ResNet-50 convolutional neural network can represent the image below as a vector of length 2048. This vector’s first three and last three elements are: [0.1392, 0.3572, 0.1988, …, 0.2888, 0.6611, 0.2909].
👁 Image

Photo by Patrice Bouchard

Embeddings generated by a properly trained neural network possess mathematical properties that make them easy to search and analyze. For example, embedding vectors for semantically similar objects are close to each other in terms of distance. As a result, by using vector arithmetic, unstructured data can be understood, searched and analyzed.
👁 Image

Embedding arithmetic

Why Should You Work With Unstructured Data?

Even though handling unstructured data can be challenging, it is still valuable for developers and businesses. Unstructured data makes up a massive 80% of both existing and newly generated data, especially in the age of AI. It contains a wealth of information that can provide valuable insights into customer behaviors, market trends and other essential business metrics for more accurate decision-making. Thanks to technological advancements, such as natural language processing and deep learning, managing unstructured data will become easier with time. Furthermore, working with unstructured data can help you discover hidden patterns and relationships that would be challenging to detect through traditional methods. Handling unstructured data will also lead to innovation and product development. We’ve already seen breakthrough applications, services and products sprout using Large Language Models (LLMs) like OpenAI’s ChatGPT to extract value from unstructured data. There will be even more in the future.

Summary

In this post, we covered the meaning and instances of unstructured data. We also explored the difficulties and techniques for handling and analyzing unstructured data to make informed business choices. In my upcoming posts, I will delve deeper into vector databases, a simple yet effective solution to store, index and search unstructured data using the power of embeddings generated by machine learning models. I will also introduce Milvus, a highly scalable and effective open source vector database, and elaborate on how Milvus can supercharge your AI-powered applications. Stay tuned for more information.
Zilliz is a leading vector database company, offering high-performing and scalable solutions. We’re powered by Milvus, the popular open-source vector database that helps companies from any scale build AI-powered search solutions.
Learn More
TRENDING STORIES
Frank Liu is the director of operations and machine learning architect at Zilliz with over eight years of industry experience in machine learning and hardware engineering. Before joining Zilliz, Frank co-founded an IoT startup based in Shanghai and worked as...
Read more from Frank Liu
Zilliz sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
👁 Image
Milvus Lite, a lightweight version of the open source vectorDB Milvus, installs easily & integrates with 20+ AI tools.