![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
You’ve seen it everywhere… you are having major problems with your application but your IT and application performance monitoring tools have not identified any issues. The wide range of outages impacting application performance all demonstrates that there are growing problems with your data pipelines.
Consequently, data quality has become a hot topic again and new tools have started to appear. But why is this happening? Why do we need to resolve a problem that’s been around since data itself, and that already has an incumbent stack of legacy tools?
Two words: Big data.
The growth in data volume over the past 10 years has created a tectonic shift in the requirements for data quality tools — and legacy tools don’t meet them anymore.
Here’s why.
Legacy data quality tools were designed to serve a different world of data. Informatica Data Quality was released in 2001. Talend was released in 2005. Comparable tools arrived in the same window. But the world of “big data” was created by three events that arrived much later.
ETL for big data began with Hadoop, which was released in 2006, but didn’t penetrate the mainstream Fortune 500 enterprise segment for another decade.
Mainstream cloud adoption began with Amazon Web Services, which was publicly launched in 2006, but wasn’t fully accessible until Redshift became fully available in 2013.
Cloud Data Warehouses (CDWs) made data warehousing accessible to everyone. But Snowflake wasn’t founded until 2012 followed by Databricks in 2013.
In Short: Legacy Data Quality tools were created long before big data arrived. As such, they were never designed to solve data quality in a big data world. While they have tried to catch up, they fundamentally do not meet the unique requirements created by the 44x increase in data volume production we’ve seen from 2010-2020.
Big data has made legacy tools ineffective across multiple requirements, including:
We have also experienced cultural changes that created their own new requirements.
These new requirements have been quietly building over the last decade, and have suddenly begun to drive new conversations around data quality for one core reason.
After a period of heavy flux in the ETL jungle, a new and stable ELT data stack has emerged. And the centerpiece of the new stack — the data warehouse — has less data integrity checks and constraints being enforced than traditional databases.
At the same time that support for data quality is thinner than before, companies depend on their data more than before. Every company is now data-driven, nobody can afford bad data anymore, and the flaws in legacy tools are really starting to hurt.
In summary, it has become painfully obvious that too much has changed, that legacy tools do not work in the new world of data, and that we need to rethink the data quality problem from a clean slate.