Join Keith McCormick for an in-depth discussion in this video Understand laws 6, 7, and 8, part of The Essential Elements of Predictive Analytics and Data Mining.
- [Instructor] The next three laws are all on a similar theme of how data mining brings us value. The sixth law is the Insight Law. Data mining amplifies perception in the business domain. I've chatted with a couple of folks that found the phrasing a little unusual here. An interesting way of elaborating on what Tom is trying to get at is a quote of Steve Jobs that a lot of folks like.
It's really fascinating. What he was talking about was the efficiency of locomotion for various species on the planet. Condors and other creatures. And what he was saying is that humans came in in a rather unimpressive showing about a third of the way down the list. That didn't look so good but then someone at Scientific American had the insight to test the efficiency of locomotion for a man on a bicycle and a man on a bicycle blew the condor away.
For Jobs, that's what a computer was, a bicycle of the mind. Something that takes us far beyond our inherent abilities. If you're intrigued with the quote, you can find it on the web and it's really something. Tom is getting at something very very similar. Remember that the whole reason that we do these projects is to solve problems in our businesses and in our organizations. As Tom puts it, this is similar to the concept of an intelligence amplifier. Early in the field of AI, it was suggested that the first practical outcomes would not be intelligent machines but rather tools which acted as intelligence amplifiers, assisting human users by boosting their mental capacities and therefore their effective intelligence.
Data mining provides a kind of intelligence amplifier, helping business experts to solve business problems in a way in which they could not achieve unaided. I think there's a really fascinating parallel between the two quotes. The seventh law of data mining is the Prediction Law. This one too is phrased in a way that you might find unusual at first. Prediction increases information locally by generalization. What is Tom getting at here? Well the term prediction has become the accepted description of what data mining models do.
We talk about predictive models and predictive analytics. Certainly we've been using this language throughout the course. But we should remain aware that this is not the ordinary everyday meaning of prediction. We cannot expect to predict the behavior of a specific individual or the outcome of a specific fraud investigation. But isn't that what we've been trying to do the whole time? What Tom is trying to say is that we're not producing a prediction as much as a propensity score, a range from high likelihood to low likelihood, something we've discussed.
As he puts it, what then is prediction in this sense? What do classification, regression, clustering, and association algorithms and their resultant models have in common? The answer lies in scoring. That is, the application of a predictive model to a new example. The model produces a prediction or score which is a new piece of information about the example. The available information in question has been increased locally. By locally, he means that the individual case, not all of our data but at that individual row.
The eight law is the Value Law. Very straight forward but terribly important. The value of data mining results is not determined by the accuracy or stability of predictive models. We tend focus, it's natural, but we tend to focus on R squared and area under the curve and all this fancy stuff that our software tells us about the models. And we have to pay some attention to that because it's an easy way for us to process the relative merits of dozens of different models.
But when we get towards the end of the process, we really have to return our focus to the business. Data miners should not focus on predictive accuracy, model stability or any other technical metric at the expense of business insight and business fit.
- What makes a successful predictive analytics project?
- Defining the problem
- Selecting the data
- Acquiring resources: team, budget, and SMEs
- Dealing with missing data
- Finding the solution
- Putting the solution to work
- Overview of CRISP-DM