Data Integration

Why You Shouldn't Use CSV For Integration?

4 min read

May 10, 2022 10:00:00 AM

Why using CSV for Data Integration is a Bad Idea?

It has been 50 years since the Comma Separated Values (CSV) data format was created and on one side it is a testimony to its purpose and design that it is still so heavily used and relied on today. Are we still using CSV because it is perfect or is it instead because a better alternative has yet to emerge?

It truly has become a Swiss army knife or “get out of jail card” for many data integration challenges over the last 5 decades.

On the other side, given the level of innovation that has taken place in the software industry in the intervening years, it is a sad reflection on the industry that companies are so heavily reliant on this approach to data integration 5 decades later.

Universal integration adapter.

There is no doubt that CSV has become the data equivalent of the universal power adaptor we all rely on so heavily as we travel from country to country. It quickly became the lowest-common-denominator way to share data between applications. For software vendors, CSV was billed as a quick and cheap way to integrate with just about anything. It magically turned proprietary systems into non-proprietary ones and became the data bus that let standalone applications share critical data. It was - and for many people still is - a data lifesaver.

What is surprising is that after all these years people’s ambitions have not risen any higher than the lowest common denominator. This lack of energy to pursue an alternative is clearly not because CSV is perfect. It is far from it, especially when you consider all the unnatural acts people must go through daily to gather and clean data to create the CSV in the first place.

If you wanted to be a little more cynical you could say that CSV was the ultimate 3 card trick for the software industry to ensure data integration did not slow down software sales. It’s incredibly easy for a software vendor to say, “If you get your data into a CSV, we can load it into our system.” However, what people soon realize is that gathering and cleaning the data from multiple systems to first create the CSV is the hard bit. Loading it is easy

Leaving aside the behavioral element of why CSV has stood the test of time for a moment, what has become harder to reconcile is how the data governance police have not looked to shut this potential threat a lot earlier. We live in a much more regulated world than existed 10 years ago - let alone 50 years ago - and, amazingly, regulators and auditors are prepared to accept such high-risk approaches in how business-critical data is shared across critical business systems.

CSV is a risky back door solution.

Not only is the creation of CSV open and prone to manual error, but it also lacks traceability and transparency around how the numbers were created and who might have changed them. Given the spotlight on data governance and regulation one can only assume that the reason this “back door” has not been slammed shut forever is that companies are still heavily reliant on CSV integration to keep their business wheels turning and the market feels there is a lack of viable alternatives.

While companies might feel they are stuck with CSV integration for some time to come, it is surprising that their users have not been more aggressive in seeking alternatives.

Waste of human capital.

Gathering, cleaning, and consolidating data in spreadsheets as an alternative might offer some soothing properties the first time you do it, but it soon loses its appeal when you do it every day, every week, or every month depending on what the data is used for. In a world where there is an emerging focus on waste – wasted time, money, and resources - the manual cleaning of data in 2022 must be one of the greatest ongoing wastes of human capital imaginable.

This apparent willingness on behalf of the employees to say nothing and plow on, regardless, merits closer inspection. While some people might be undertaking these mundane activities because their job simply requires them to do it, it doesn’t explain why such a large cohort of clearly overqualified people are also prepared to just continue to grind out the data hard yards.

The most obvious reason they continue to do this might simply be because they can. Think about it, CSV integration has been more of a lifesaver for a data worker or business end user than anybody.

It has given people the independence to move their data between applications without needing to rely on other colleagues or departments. It has given users complete independence in their job - all their unnatural data acts are a small price to pay to preserve that freedom.

While it is hard to imagine that the creators of the CSV format expected it to still be so heavily relied on 50 years later, it is equally hard to consider a world where CSV integration does not exist.

Imagine something better.

What is easiest to imagine is a world where companies and employees do not need to rely on CSV to do their job the way they want to while taking on the risks and the waste of resources it brings with it.

A new generation of data integration and blending platforms have emerged that provide the governance and control regulators and auditors require and provide the control and auditability that IT desires while maintaining the business users’ data independence.

Most importantly, data integration and blending platforms are so easy to use, people can self-serve all their data integration requirements while eliminating the risk the CSV back door creates for the business.

To find out more about these new generation data blending platforms go to www.eyko.io.