Explore a comparison of the data requests that are made by data scientists to other, more typical, users of enterprise-level data.
- [Instructor] For both understandable and appropriate reasons, most enterprise data management and data governance is designed around the typical user of enterprise data. When I attend and speak at the Data Warehouse Institute conferences, a lot of the specialists in the Data Warehousing area agree that about 80% of users of data are content with existing tables, data sources, reports, and dashboards.
The other 20% need some kind of extra attention. Most of these same experts put data scientist in the top one to three percent in terms of their need for custom request and access to unusual data. It's worth pausing to ask why, as we've seen, they're often looking over long-time horizons and should also be crossing departmental boundaries, pulling data from multiple silos. No routine monthly or quarterly report is going to do that.
Frankly, if they aren't at least considering unstructured data, they probably aren't trying hard enough at their part of the job. Granular requests will be common as well. I remember one of my favorite examples, on a term project, I wanted the billing detail which no one asked for since the billing statement transactional history was really quite detailed, but I also needed unpaid transactional activity like how often they sent or received the text on an unlimited plan because only with that could I understand the psychology and behavior of the phone user.
Unpaid transactions gave me insight in how they spent time on their phone. So if you're pulling a quarter billion transactions for ten million phone customers, you might want the IT team to do that on a server on the data scientists behalf. But the trick is, that until the data scientist explores it, they can't be 100% certain of what they need. You'll still have to work together and try to be patient with one another. The data scientist ultimately has to explore the transactions file and has to explore the customer file that is produced when all those transactions are manipulated and transposed.
On any project, there's going to be some surprises and those surprises may prompt you to have to pull the data and transform it again. If this happens two dozen times, then the data scientist is probably new in their role but even the most veteran modeler may have to make adjustments and pull the data more than once. It's one of the most trickiest stages of any modeling project, getting all the people and all the machines working together to pull that data that's going to graduate on to the next steps, data prep and modeling.
Note: This course is software agnostic. The emphasis is on strategy and planning. Examples, calculations, and software results shown are for training purposes only.
- Evaluating the proper amount of data
- Assessing data quality and quantity
- Seasonality and time alignment
- Data preparation challenges
- Data modeling challenges
- Scoring machine-learning models
- Deploying models and adjusting data prep and scoring
- Monitoring and maintenance