Join Keith McCormick for an in-depth discussion in this video Unexpected results, part of The Essential Elements of Predictive Analytics and Data Mining.
- One of the things you'll learn is that surprises are good. A common mistake in data mining is being too frugal with predictors, leaving out this or that variable because everybody knows that it's not a key driver. Or maybe a subject matter expert told you not to bother because it was not found to be a predictor last time someone tried. Not wise. Even if this is true, it discounts the insight that unanticipated interactions might provide.
Variables mix and combine in powerful ways. Also, leaving them out is needless precaution because data mining algorithms are designed to be resilient to large numbers of related predictors. This is not to say that feature selection, which is essentially deciding what variables to keep or drop is not an important skill. It is, but rather the data miners must be cautious when removing variables. Each of those variables cost the business money to record and the insights that they might offer also has monetary value.
Doing variable reduction well in data mining is strikingly different than doing it in statistics. In statistics training, sometimes they emphasize parsimony which is basically keep it simple. There are a lot of good reasons to do this, but data mining algorithms are designed to handle lots of variables. When we leave a variable out, we sacrifice accuracy. Worse than that, we deny ourselves one of the most powerful experiences that these projects can provide.
Surprises. When we encounter one, it makes the model more accurate, but we might also be able to put that insight directly to use. So please don't be overly cautious. Let the algorithm do it's job and cross your fingers, and hope for a surprise or two.
- What makes a successful predictive analytics project?
- Defining the problem
- Selecting the data
- Acquiring resources: team, budget, and SMEs
- Dealing with missing data
- Finding the solution
- Putting the solution to work
- Overview of CRISP-DM