Clean data – Is the juice worth the squeeze?

3 min read
Jun 7, 2022 10:07:00 AM

Are we reaching the point where we should seriously give up on our obsession with clean data?

No one would argue against using clean data when making business decisions, but it is time to reconsider our approach to how much data we clean and when we clean it.

A lot has changed in the world of enterprise systems since our obsession with data cleaning first started. We now live in a world of disconnected systems that produce inordinate amounts of data to drive each function of our business.

While some teams, let us take Finance as an example, might do a fantastic job at keeping their data clean in the Finance systems, other teams are often considered repeat offenders when it comes to creating duplicate and dirty data. An example is the Sales team creating a customer name or account name first in the CRM (Customer Relationship Management) system often without regard for the legal company name, which is what Finance and Customer Service eventually need.

Human after all.

The purpose of this discussion is not to call people out, it is to explain that it does not matter how hard individual teams in an organization try to keep the data clean within each system, it is almost impossible to maintain data clean across different enterprise systems.

When you consider the increasing volume of data that these systems are now generating even the most conscientious and diligent professionals, especially in the data team, understand deep down this is becoming a zero-sum game.

Invariably, the first step in almost any data project is to clean the data. This is done before knowing if the data holds the answer to the business question or is critical to solving the business problem.

Only clean as and when.

Why are we still spending time and resources cleaning data that no one is going to use?

In most organizations, data is cleaned and scrutinized before knowing which data is needed. The sheer volume of data means every project takes more time than it needs to take, and it is going to take longer to make critical business decisions.

If we are brave enough to accept that it is just no longer possible to clean all the data, we can focus instead on what is most important: using data to quickly gain business insights and make informed decisions faster.

With customers, suppliers, products, and employee records being created at distinct stages in different enterprise systems, there is little doubt securing the referential integrity of key data fields across all systems is critical.

The referential integrity of the data is how data is joined across systems and provides a multi-dimensional view of the data. Even with everybody “sticking to the script” and keeping the data clean in each separate system it is too difficult to ensure data integrity once data from multiple systems needs to be blended together.

However, there are number of emerging data platforms that can leverage new technologies to dynamically clean data and automatically rebuild customer, product and supplier master lists across any range of source systems.

<begin pull quote>

The emerging philosophy of cleaning only the data you need when you need it is anchored in recent technologies that can clean data faster.

<end pull quote>

Let the machine take the strain.

Modern technologies let the machine take the strain and help with data cleaning by taking advantage of Natural Language Processing, Artificial Intelligence, and Machine Learning to automate the task of cleaning much of this data. Automation is necessary given the size and scale of this task.

By adopting this philosophy and leveraging innovative technologies, organizations do not have to clean data that people are not using, meaning data projects deliver results much faster.

New data blending and preparation platforms instantly connect data, swiftly structure it, and then present the data in products like Excel. Users can easily review this data and once they decide it looks promising, can start the clean-up process as they begin working with data.

Any changes to the data are immediately reflected in whatever platform is used to view the data. The more time invested in the data the better it becomes. Use a data blending and preparation platform to only worry about cleaning the data you need when you need it.

To find out more about these types of systems please go to

Describe your image