From the course: Machine Learning and AI Foundations: Predictive Modeling Strategy at Scale

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Assessing data

Assessing data

- [Instructor] Now what I would like to do is kind of invite you to watch my thought process as I assess a data set from the modeler's point of view. What this is going to do is give you an idea of why the modeler has to have access to such a huge amount of data in the early part of the process, while they're trying to figure out what the final form of the data will be. The first thing we're going to do is focus in on the number of transactional rows per case, so let's take a quick look at some transactional data. Now, I may even do this visually at first, like I might look to see that this customer has just a few transactions, but obviously I'm combining this with queries; I can see this customer has many more. Here are some of the questions that I'm asking at this point: What is the maximum number of rows that I'm finding for any particular customer ID? And I might have millions or tens of millions of transactions that I'm looking at, or even more. I'm also curious if there are any…

Contents