Hands up - who has been involved in or heard of a data warehousing project that has gone horribly wrong? And in the interest of balance, what about a data warehouse success story? While there will clearly be hands up on both sides of the aisle, there is now doubt that the horror side of the debate has a louder voice, and it is not just because bad news tends to travel faster.
Fewer business enterprise technologies promised more than the concept of a data warehouse. It has been presented by software vendors and consulting companies as the cure for about any type of problem around enterprise data for the last number of decades.
Data warehouses first emerged in the 1980’s around the same time as the precursor to today’s ERP systems were starting to take off. These enterprise applications brought control and structure to certain business processes like manufacturing, distribution, and finance like never before, but they were difficult to use, had complex data structures and offered business users very little management information in order run the business.
Simply genius concept.
The concept of a data warehouse at the time was simply genius – giving business users a way to work with the data and apply their own rules to the data, independent of the specific data structures in the transaction systems. IT supported this approach because it took the reporting strain off the production systems and did not risk impacting the performance of these critical systems that were running the business.
While data warehousing at its core is a simple concept, its flexibility meant it proved more difficult to implement than people first imagined. Some of this was down to the technical choices people made but a significantly greater element was the business could not agree on what where the right metrics to run the business or how the data needed to be consolidated.
On the technology side, the biggest challenges invariable centered around connecting to multiple source systems, cleaning, and organizing the data into a common structure. Once data warehouses became popular, ETL (Extract, Transform, Load) tools emerged to offer a readymade solution to automate and simplify the creation and loading of data into the data warehouse.
These tools also help overcome another challenge for lots of implementations - how and when the data would be refreshed from the source systems into the data warehouse. When data warehouses first emerged, people's need for “real time or right time” data was not as important as it is today, and overnight updates were more than acceptable for everybody.
The business cannot agree.
While the technical challenges were big to begin with, they are not the only driver for the project overruns that can sometimes leave an unpleasant smell around data warehouse projects for many organizations.
To implement a data warehouse, the organization not only needs to agree on the metrics they should use to run the business, but also the definition of these metrics. Discussion around the data ranges from what type of data people need, to what level of data is required, and how it should be consolidated. While data warehouses are flexible in how you defined the hierarchies in your data, once they are built, they are not easy to change.
This effort to changing the models requires the project teams to look into the corporate crystal ball and try to predict the questions the business will be asking months into the future and make sure the data that is required to answer these questions is not only included in the model but is also consolidated in the right way to give the necessary answers.
The pain and effort in reworking the models once built fuels a type of “analysis paralysis” which means projects slip, people lose confidence, and some end users start to explore alternatives that will give them the information they need to do their job without having to rely on the data warehouse.
While the concept of a data warehouse is old, the business problem it looks to solve remains as big an issue today, than at any other time in the past. With increased business data fragmented across a wide range of disconnected systems running either in the Cloud or on-premise, the challenge of pulling this data together to run the business has never been more difficult.
For people who are committed to this idea of gathering data and organizing data centrally and enabling business users to work with this data, several modern technologies have emerged that address both the technical and human challenges that blunted the positive impact of data warehouses the first time around.
A new generation of data blending and streaming platforms offers all the benefits of a data warehouse while removing the challenges outlined above, creating what some people might call a virtual data warehouse.
These technologies are sophisticated in how they understand the systems they connect to. They leverage modern technologies like AI and Machine Learning to clean data from multiple systems at speed and scale. They hold and aggregate the data in a central repository and enable people to stream this data directly into whatever application they use to consume the data. These platforms do not care what application the user wants to work in, they focus on ensuring the data is consistent regardless of the source application.
These platforms change how people approach their data projects. They encourage a less procedural and more agile approach. The ease with which you can connect to systems, blend data, and see the result in Excel, for example, means that people spend less time designing the perfect system and more time perfecting the system as they go.
No more spending time cleaning data that people do not use. This “perform an action and immediately see the results” approach means people see results faster and are more likely to the project as opposed to going off and building their own shadow data system.
For people who don’t already have a data warehouse these Data blending and streaming platforms deliver the benefits without the pain; however, for people who already have one, these new generation of solutions can coexist and by sourcing data directly from enterprise applications and streaming cleaned, blended, and enriched data into the existing data warehouse.
By leveraging these solutions, organizations can either continue to use data warehouses, or they can create a virtual Data Warehouse and experience all the benefits of centrally cleaned and aggregated data from multiple sources that can be directly streamed to any application without having to experience all the downsides.
To find out more about these types of systems please go to www.eyko.io
You May Also Like
These Related Stories