Start free trial Sign in

From the course: Executive Guide to Predictive Modeling Strategy at Scale

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Aggregate and restructure

Aggregate and restructure

From the course: Executive Guide to Predictive Modeling Strategy at Scale

Start my 1-month free trial

Aggregate and restructure

“

- [Narrator] In my experience, some of the most important variables get generated when you convert a very tall transactional data set into a case level data set, but you lose a lot of information when you go from a lot of rows to fewer rows so what kind of information do you want to keep? A lot of this will seem obvious like you might do total purchases or median or mean purchases. What might be less obvious at first is it won't be clear to the data scientist which of these variables is going to work best until they're in there assessing and exploring the data. So for instance, somewhat famously in statistics means are sensitive to outliers. The presence of just a few outliers can change the mean, but it doesn't change the median as much. Now an experienced data scientist will just grab a few of these but there are some that they won't know about until they look. For instance, looking for something like the number of…

Contents