From the course: Machine Learning and AI Foundations: Classification Modeling

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Data reduction

Data reduction

- [Instructor] Okay, I wanna give you a brief introduction to a big topic, data reduction. You'll hear other phrases for this as well, like feature selection. So, what is it? It's the removal of poor and redundant predictors before modeling. There's yet another phrase that gets at this, sometimes you'll hear people talking about optimal subsets. So, you've got your pool of variables and are you choosing the right ones? Now, as you know, algorithms differ in how they tackle this. Some tackle it directly and others don't, which raises the following issue. How much do we have to worry about this? Aren't the algorithms taking care of it? After all, we know that trees and stepwise logistic as well as stepwise discriminate don't use all the variables. They're supposedly picking the best ones. And what about a technique like neural nets? Now, that uses all the variables, so you would think this is more of an issue, perhaps, but if you think it through, what a lot of folks will say in…

Contents