Join Keith McCormick for an in-depth discussion in this video Degrade models, part of The Essential Elements of Predictive Analytics and Data Mining.
- [Instructor] The next element, and something you can count on happening, is that your models are going to degrade. It's inevitable. If the model remains static, but the world is changing around it, the model's ability to make accurate predictions is going to degrade. But what can you do about it? Well this is a potentially large subject but there's a number of things that we can take as basic principles. First a brief mention.
The cross industry standard process for data mining, which we're going to discuss in the last chapter, has six phases that end with deployment. But over the years many have proposed that it would be useful to have a potential seventh phase called Monitoring. And in a sense this is really the same topic. So one thing you have to concern yourself with is that on a routine basis, possibly even nightly, some models even do this in near real time, you have to double check to make sure that the weight (so the coefficients as we call them) are kept accurate.
Let me give you a tangible example. For instance this is some data from the famous Titanic accident where zero, shown in blue, are the folks that died. Or the one shown in red, are the passengers that survived. Notice down in what's labeled as Node 14 in the lower left. The survival rate for those that were less than or equal to 13 years of age is 63%.
Well that cutoff of less than or equal to 13 years of age could change over time. It could get older and become 14 or 15. It could become younger and be 11. This is the notion of recalibrating and most statistical software and predictive analytic software will actually allow you to recalibrate. Not always the term they use but allow you to recalibrate these models on a regular basis. Over time our models are going to require more attention than just that.
Eventually we'll want to remodel more thoroughly. Perhaps we want to consider new variables or different algorithms or different modeling techniques. So essentially what we're doing is we're still keeping the data preparation steps that we've put in place before. But we're allowing the algorithms to do their work again. Not simply recalibrating those values. So again this could include using variables that we didn't consider the last time and so on.
Eventually what's going to happen is your sources of data are going to change. You're going to transition to a new data warehouse. You're going to open up new lines of business. You're going to have competitors that you didn't have before. You're going to have sources of data, perhaps unstructured data or social media data, that somehow are being incorporated. Things have changed so much that you really have to go through all the phases of chris-dee-em again. That's a very different operation.
One hopes that this only happens about once every three to five years. But eventually you'll even have to do that.
- What makes a successful predictive analytics project?
- Defining the problem
- Selecting the data
- Acquiring resources: team, budget, and SMEs
- Dealing with missing data
- Finding the solution
- Putting the solution to work
- Overview of CRISP-DM