VOOZH about

URL: https://thenewstack.io/the-next-wave-of-big-data-companies-in-the-age-of-chatgpt/

⇱ The Next Wave of Big Data Companies in the Age of ChatGPT - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-03-24 07:40:54
The Next Wave of Big Data Companies in the Age of ChatGPT
Data

The Next Wave of Big Data Companies in the Age of ChatGPT

Just as cloud computing ushered in a raft of 'big data' solutions, generative AI is a catalyst for a new wave of data intelligence companies.
Mar 24th, 2023 7:40am by Richard MacManus
👁 Featued image for: The Next Wave of Big Data Companies in the Age of ChatGPT
Image via Shutterstock 

Remember the catchphrase “Big Data”? It spawned many successful companies in the cloud computing era — such as Snowflake, Databricks, DataStax, Splunk and Cloudera. But now we’re in the AI era and supposedly machine learning software is at or near “intelligence” now (even if it is prone to hallucinating — but then, aren’t we all?).

So given the current AI boom, do we even need “big data” companies that sort and organize the world’s data? Can’t the AI do that for us now?

To find out how data companies are adapting to the AI age, I spoke to Aaron Kalb, a co-founder of Alation, which styles itself as a “data intelligence” platform and promotes a concept it calls the “data catalog.” This combines “machine learning with human curation” to create a custom store of data for enterprise companies.

How ChatGPT Differs from Siri in the 2000s

Before co-founding Alation with ex-Oracle executive Satyen Sangani, Kalb worked at Apple on its Siri software. Siri was perhaps the first mainstream software application to make use of AI language modeling. So I asked him how different is the current generation of generative AI software (such as ChatGPT and Google Bard) compared to what Siri was doing in the late 2000s.

“Siri had a difficult job at first, because they didn’t have conversational training data at the time,” he replied. “They were the first voice assistant.” The corpus that the language models for Siri were trained on was much smaller than the training data of large language models (LLMs) today — Kalb called Siri’s training data a “journalistic corpus.”

As well as relatively poor training data, Siri didn’t use much machine learning. Kalb says that Siri made a lot of mistakes when used, in both voice-to-text and text-to-intent. “And I think to this day, Siri, Alexa, Cortana and Google Assistant, all have struggled,” he added.

Why Does AI Hallucinate?

All that said, it’s not as if generative AI is perfect either. I asked Kalb what he makes of the current issues with hallucinations (making up facts) that affect software like ChatGPT and Bard.

Kalb suggests that it’s a “psychological phenomenon” for the human users of generative AI, more than an issue with the software itself.

“For many kinds of prompts, it really seems as though it is understanding the prompt and formulating an answer and then putting it into words,” he said, regarding ChatGPT and similar software. “And it’s just so impressive. We think that it has understanding and true intelligence. What it’s actually doing is [that] it’s basically a super sophisticated Markov model, where it’s saying, hey, what’s the next word given the prior words it said, the prompt before that, and then the entire internet probabilistic distribution of words before that.”

He thinks the hallucinations are in a sense “forced” on the AI software, sometimes because the human prompts were not good enough.

“The hallucination seems like, wait, you’ve gone crazy in the middle of your logic! But, in fact, it’s just an artifact of the algorithm […] it has a distribution of all the words that could possibly come next, and it picks one with some statistical randomness. And the hallucination is what happens when it gets to a point where it gets very unlucky, so to speak; or, given the prompt, it is not obvious what to say. And so it’s forced to pick something, more or less a shot in the dark.”

How Data Intelligence Fits into the AI Landscape

So what is “data intelligence”? Kalb started answering that by noting that both AI and the common enterprise acronym of BI (business intelligence) are “garbage in, garbage out.”

“So data intelligence is this layer that precedes AI and BI, that makes sure you can find, understand and trust the right data to put into your AI and BI.”

In this context, he said, taking something like ChatGPT from the public internet and bringing it into the enterprise is very risky. He thinks that data needs to be, well, more intelligent before it is used by AI systems within an enterprise.

Also, he doesn’t think that the “internet scale” of ChatGPT and similar systems is needed in the enterprise. This is where Alation’s “data catalog” comes into play, as it will “distill down” the data and give it “specific mapping.”

Every organization has its own terminology, he said — that could be industry terms, or things that are very specific to that company.

“So that’s where data intelligence and the data catalog helps,” Kalb explained. “It helps to map that last mile of how language is used by people in the organization, and how data is stored in the databases.”

Alation’s software automates the process of putting an organization’s data into these “data catalogs,” which can then optionally be fed into a generative AI system (if the company wants to do that).

The way Kalb explains it, data intelligence is “step zero for whatever the task is — whether it’s [data] preprocessing, or ML training, or just making a spreadsheet and analyzing it for a shareholder meeting.”

Welcome to the Next Wave of Big Data

So far I’ve spoken to generative AI companies like Cohere and Vectara about their vision for enterprise IT. Both had mentioned the use case of an employee being able to have a conversation with an AI trained on large language models — essentially, what IT has traditionally called “knowledge management,” but now it’s in chatbot form.

Kalb makes a good point, though: much depends on the quality of the data the generative AI has been trained on. He sees data intelligence as “the missing link” between ChatGPT and “the dream of having an enterprise portal where you can ask a question in English and get an accurate, trustworthy answer about your business.”

So just as cloud computing ushered in a raft of useful “big data” companies built off the back of it, it seems clear that generative AI will be a catalyst for the next wave of data intelligence solutions. As I’ve been saying a lot this year in relation to AI, watch this space!

TRENDING STORIES
Richard MacManus is a Senior Editor at The New Stack and writes about web and application development trends. Previously he founded ReadWriteWeb in 2003 and built it into one of the world’s most influential technology news sites. From the early...
Read more from Richard MacManus
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Databricks.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.