Join Alan Simon for an in-depth discussion in this video Confronting data-mart fragmentation problems, part of Transitioning from Data Warehousing to Big Data.
- We've seen how the basic data warehouse architecture is very straightforward and relatively simple. Let's look at a more representative picture, though. But still, a straightforward one that's easy to follow. Here we see eight different applications that we need data from for our reports and analytics, and we feed information from those eight applications into an enterprise data warehouse. Very often though, we don't stop there. What happened a lot in the early days of data warehousing was that after an enterprise data warehouse was built, the planners would then pull certain subsets of the data into what were called data marts.
And you can think of this model here as the source systems being the suppliers of data, the enterprise data warehouse essentially a wholesaler of data, but the end users would go to the retail locations, in other words, the data marts, when it was time for them to build their reports and dashboards. In this particular scenario despite the fact that are enterprise data warehouse has information from all these different source systems people that need information about the Sales organization would go to the Sales data mart, likewise to the Supply Chain data mart for Supply Chain people, and then the people from Finance and Accounting would go to the Finance data mart.
What happens though, is very much the same thing that we saw earlier, when it came to predictive and discovery analytics. As soon as data started getting pulled out of the data warehouse, rather than just coming into a data warehouse, many organizations would begin to bypass the enterprise data warehouse, and you can think of this as the enterprise data warehouse being a car that's going very slowly in the middle lane so what would you do then? You would go ahead and pass it.
If the process to add data into the enterprise data warehouse had restructured the enterprise data warehouse was somewhat slow and cumbersome, as we've seen, organizations, as they need to build out their Sales data mart, or their Supply Chain data mart, or any other data mart, would very often bring in their own source data. This, in and of itself, isn't necessarily all that bad, but what often happened, in many organizations, was that the enterprise data warehouse would actually disappear, and what was left was nothing but a set of independent data marts that would each draw their data from the source systems they needed.
If this picture looks kind of familiar you're right. Just like with extract files in the 1970s and 1980s, before data warehousing appeared on the scene, the heavy usage of data marts without enterprise data warehouses in the middle resulted in this mish-mash of data feeds from all the different source systems. And what would typically happen then is the predictable lack of standardization of data, disagreement among the various data marts for what data actually meant, and the lack of any sort of enterprise conformity.
The end result is what almost every organization today deals with. The fragmentation of data marts all across their enterprise. You'll find varying definitions, varying business rules, in fact, most organizations have a variety of different hardware and software platforms for their various data marts, and the end result is that they have a federation of non-integrated point solutions that do some good when it comes to specific reports and analytics, but don't necessarily serve the cross-functional needs and the heavy-duty analytics needs that most organizations require today.
- Exploring big data, Hadoop, and analytics
- Examining the shortcomings of traditional data warehousing
- Comparing big data architectures for next-generation data warehousing
- Understanding alternatives
- Building a roadmap
- Managing big data-driven projects
- Monitoring and measuring success