From the course: Machine Learning and AI Foundations: Classification Modeling

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Imbalanced target categories

Imbalanced target categories

From the course: Machine Learning and AI Foundations: Classification Modeling

Start my 1-month free trial

Imbalanced target categories

- [Instructor] Okay, let's talk about a very common problem, and that's the problem of an imbalanced target variable. So of course we're talking about binary classification, so this is a concern when one of the categories is more common than the other. It's really when it's more dramatically out of balance. And keep in mind that this is an issue that potentially affects all the algorithms. It's very easy to see in a decision tree, so I'm gonna visualize it with you using a decision tree, but it affects the others, as well. So with that in mind, let's take a look at Titanic data. We have 38.5% survival rate, so conversely we have 61%, a little bit more, of those that died. In other words, our target variable is not 50% and 50%. When is this a concern? It's usually when it's more dramatic. 70%, 80%, 90% in one category. A really dramatic one that happens quite regularly is fraud, where you might have a ratio of 200 to one or 500 to one or even more. So what does it look like when you…

Contents