From the course: Executive Guide to Predictive Modeling Strategy at Scale

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Balancing

Balancing

- [Instructor] What do you do if you're analyzing something that is highly out of balance? For instance, it's extremely unlikely that fraud represents half of our insurance claims. We certainly hope not. So perhaps you've got 20,000 fraudulent claims, but over the same period, two million legitimate claims. The concept of balancing is forcing the numbers to be in balance by discarding at random some of the common cases. In this instance, legitimate claims are more common so we discard some of them. I get into the theory of this in more depth in my introduction to classification course. For now, trust in the fact that the modeler does this fairly often and they know how to do it and they do it because it works. For now, let's attend to the following fact. We're suddenly down to just 40,000 cases and we're only going to have half of those cases in our training data set. But don't you want to emphasize training…

Contents