Join Alan Simon for an in-depth discussion in this video Examining data warehouse methodologies, part of Transitioning from Data Warehousing to Big Data.
- Data warehousing and business intelligence have been around since the early 1990s, which means that many different approaches have evolved. Any BI or data warehousing partitioner today has dozens of methodologies to choose from. Each of those many methodologies falls into one of two major camps. One camp revolves around a Data First philosophy. The other is primarily Functionality Oriented. Let's look at each of those in more detail. The many different Data First methodologies will disregard specific reports and functionality, and instead focus on what we know about our data as we decide how our data warehouse will be architected, and structured, and then built.
We'll focus on the key subject areas that happen to be in our many different source systems. We'll focus then on the dimensions, the various facts, and of course the business rules that govern the data. We don't focus, at least initially, on how the data's actually going to be used. The idea is to build out a data warehouse that, as best as it possibly can, will support a wide variety of reports and analytics, even before we go ahead and define those.
Methodologies that are Functionality Oriented focus initially on very specific reports and dashboards and visualizations that we know we need, or that come out in requirement sessions with our business users. They will concentrate on things like the metrics, key performance indicators, and again, the business rules that govern all those different reports and dashboards. Once all of these are defined, then the key decisions about the data warehouse will then be made.
Regardless of the methodology, though, whether it's a Data First methodology or one that's Functionality Oriented, business rules are absolutely critical before we can go ahead and finish up what we're doing with our reports and our visualizations and our dashboards, or the data warehouse itself. We need to have a deep understanding of our organizational data. We also need to know, at least generally, how the organization will use the data. And we have to have the knowledge of how any given piece of data is related to other data.
So whether we start with data, or whether we start with functionality, at some point we have to have all of these things in place before designing and building a data warehouse. What this means, though, is that despite best practices for the past couple decades focusing on either Data First or Functionality Oriented methodologies, we have some disadvantages to either of these methodology camps. First, data warehouses are built slowly. All of those definitions of business rules and all of that requirements analysis work takes time.
Enhancing a data warehouse once it's already built is also time consuming. We build an architecture based on reports and analytics, or based on what we know about the data, but things tend to change over time, and as our needs for our data, and as our data itself evolves, trying to make changes often requires going back to the drawing board at least somewhat, and then re-architecting and maybe even partially rebuilding our data warehouse. What happens then is, whereas data warehouses tend to be somewhat cumbersome and somewhat slow, separate, faster, fragmented solutions, data marts, often are the result to overcome some of these deficiencies and challenges.
And whereas that might seem to be a good solution in the short term, it actually has a number of long-term disadvantages, which we'll look at in a moment.
- Exploring big data, Hadoop, and analytics
- Examining the shortcomings of traditional data warehousing
- Comparing big data architectures for next-generation data warehousing
- Understanding alternatives
- Building a roadmap
- Managing big data-driven projects
- Monitoring and measuring success