VOOZH about

URL: https://thenewstack.io/machine-learning-still-struggles-to-extract-meaning-from-language/

⇱ Machine Learning Still Struggles to Extract Meaning from Language - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2021-09-01 10:00:12
Machine Learning Still Struggles to Extract Meaning from Language
contributed,
AI / Software Development

Machine Learning Still Struggles to Extract Meaning from Language

intrinsic elements of language make it incredibly challenging to apply mathematical techniques to deliver real understanding of the meaning in natural language.
Sep 1st, 2021 10:00am by Walt Mayo
👁 Featued image for: Machine Learning Still Struggles to Extract Meaning from Language
Feature image par lisa runnels de Pixabay
Walt Mayo
Walt Mayo is chief executive officer of expert.ai. Prior to joining expert.ai, Walt led the growth of Endeavor, an impact capital organization focused on scale-up businesses, more than tripling its global market reach and developing major new sources of revenue. He began his career after graduating from Harvard University as a Foreign Service Office and served in US diplomatic missions, the White House and the US Congress. He has an MBA from the University of Virginia and lives in the Boston area.

Language is a fascinating construct and at the heart of how humans share and understand ideas and knowledge. For something so complex and nuanced, most people fail to acknowledge it as such, since it seems (and is) instinctual and natural. That’s why we call the language of human communication “natural language.”

We begin absorbing language from infancy. The simple words come in the first year or two. By the age of six, we’ve added thousands more to our vocabulary, and by our teenage years, upwards of 100,000 learned words. But as much as language is an innate capability for humans, machines find it very difficult.

This is a classic example of Moravec’s Paradox which states that what’s easy for machine is hard for people and vice versa. Software can compute mathematical operations on large number sets quickly and flawlessly, but it struggles with everyday human activities like recognizing objects in their surroundings or comprehending language. And while there has been a tremendous amount of activity to develop software that understands natural language in the same way as humans, it remains a major challenge.

Words Are Not Numbers

The last 20 years have seen an explosion in the amount of data of all forms produced and captured. Broadly, this data falls into two categories:  structured and unstructured. Structured data is numerical and organized, and by definition is the basic input of mathematical operations. Thanks to machine learning (ML) and the overall growth in data processing capability, AI has made solid progress in producing predictive insights from structured data for everything from potential machinery failure to fraud detection. If you can express and structure data numerically, you have a potential candidate for machine learning driven insights.

But digital technology has also produced a massive increase in unstructured data which includes pictures, videos, and language data. This is where traditional machine learning-based natural language processing (NLP) techniques have fallen short. Language is data-dense — it carries a tremendous amount of potential information depending on how it is used.

As a thought exercise, just list the number of meanings and usages of any common word like “bat.” These meanings flow from context. Linguist R.J. Firth wrote, “you shall know a word by the company it keeps.” These intrinsic elements of language make it incredibly challenging to apply mathematical techniques to deliver a real understanding of the meaning in natural language. And yet, there is a more fundamental shortcoming of a “one-size-fits-all” machine learning approach to language: the knowledge problem.

The Knowledge Problem

The language challenge compounds when you enter the real world of complex language documents that powers so many enterprises and is unique to their domains. These are, by definition, edge cases that make the language even more complex. Machine learning models know the world only through the data on which they are trained, and they arrive at their outcomes through algorithms that are in many cases complex and opaque — the famous “black box” characteristic of so many AI approaches.

Much of the work in delivering a real-world solution rests on ensuring the data set is large enough and representative enough to capture the information that a subject matter expert recognizes only after years of experience and training. In many cases, such a large volume of training data is not available. This is an ongoing exercise as well, given that the real-world changes over time and the models need to undergo retraining.

Even the much-publicized advances of large language models like GPT-3 offer little reason for optimism against this complexity. These models rely on massive data sets for their training and can handle relatively simple language cases. But lacking any true grounding in a specific domain, they fall well short of what a human with experience and knowledge uses to understand intent, context, and meaning.

The Whole Exceeds the Sum of the Parts

There is emerging recognition of the need to combine the capabilities of machine learning approaches with knowledge-based approaches that build on what the experts in an enterprise develop over years. These knowledge-based approaches are known as symbolic AI and rely on techniques for embedding knowledge similar to how humans build their own mastery of a subject.

The symbolic approach offers the added benefit of explainability in that outcomes are tied to explicit representations of knowledge. The symbolic approach, in fact, was the first technique used for AI natural language understanding and is increasingly viewed as a necessary complement to more recent machine learning approaches.

The combination of learning and knowledge approaches offers the ability to generate deep understanding at scale with insights relevant to the domain and outcomes that are explainable. This “hybrid” approach can enable people to be better at their jobs (be more expert) by ensuring that relevant information embedded in language is captured and delivered in a scalable way for faster, smarter, and more consistent decisions. This is ultimately the arena in which businesses compete, and where the best technology delivers.

TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.