Join Keith McCormick for an in-depth discussion in this video Understand metamodeling, part of The Essential Elements of Predictive Analytics and Data Mining.
- [Instructor] The next element is metamodeling. You may have heard of so-called ensembles. They're very popular and very powerful. For instance, 1,000 decision trees is often called a random forest. But I'm referring to it here as metamodeling, because ensembles are just one approach to having models collaborate with each other. So, the idea of ensembles is to have more than one model all generating propensity scores, so that they're acting much like a committee.
Then, together, they're making a more accurate prediction than they would make alone. Powerful technique, but there are other approaches to have models collaborate with each other. Another approach is to have models in serial. Let me give you a quick example. Let's say that you've successfully built a model that identifies which underclassmen in a university setting are at risk of not graduating. Well, you could take those risk scores and then say all of those undergraduates with a risk score above six or above seven out of 10, let's say, was going to receive tutoring.
However, do you know that all of those students that are at risk would find that tutoring was a successful intervention strategy? There's a good chance that some of the reasons that they might be at risk of not finishing on time could be financial in nature or could involve other things. So, you might build a second model that says, given that they're at risk, will our intervention strategy work for them? And you can see how these models would be working together for a more complete result.
There's a third way. Frankly, there's probably more than three, but these are three big categories of the ways that you can do metamodeling. The third way is having models working in parallel. Often the reason this might be happening is you might need two different models not because of something about the business situation, but maybe something about the data. For instance, you might have some data that's fairly complete and some data that has a lot of missing data in it.
So, you might build one model for the complete data and then build another model using an algorithm that deals with that missing data in a different way. The data gets routed based on this criterion, purely a data criterion, and then combined in a way that's essentially invisible to the end user. They just see that all of the cases have a propensity score. Metamodeling is a bit of an arch, but it's an extremely powerful technique.
In my experience, no complete project that's gone all the way to deployment has ever involved a single model. They always involve groups of models working together.
- What makes a successful predictive analytics project?
- Defining the problem
- Selecting the data
- Acquiring resources: team, budget, and SMEs
- Dealing with missing data
- Finding the solution
- Putting the solution to work
- Overview of CRISP-DM