Big data projects frequently rely on wild-caught data, which poses special challenges for data preparation and typically consumes the majority of the effort on any data project.
- [Narrator] Salmon can be farmed … or they can be caught wild. … But either way it takes a fair amount of work … before they are turned into this. … Everybody knows that food prep … is an important although time consuming … and frequently tedious part of cooking. … There is a similar principle in any big data project. … The rule of thumb is about 80 percent of the time … on a big data project is spent preparing the data. … And that's been my own experience. … Now there are several reasons … why this may be the case. … It includes things like how is the data entered? … If you're using wild caught data, … meaning data that you found out there in the world … and that maybe was entered with free text. … You have to look at things like place names. … Here are four different ways of indicating California. … You can write it out, you can use various abbreviations … and the inclusion of a period. … At least by default marks it as a separate answer … than the one without a period. … Or when people are putting in dates. …
Author
Released
9/19/2019- Identify the components that make up big data.
- Examine how big data has grown over the last few years.
- Explain the importance of using big data in business organizations.
- Distinguish between knowledge requirements for using big data and for understanding data science.
- Justify the need for training on big data within an organization.
- Analyze the factors that go into utilizing big data on a project.
- Differentiate outcomes that are derived from big data from outcomes that are derived from observing behaviors.
Skill Level Beginner
Duration
Views
Related Courses
-
Learning Data Science: Ask Great Questions
with Doug Rose1h 14m Intermediate
-
Introduction
-
How big data shapes AI1m 46s
-
-
1. Defining Big Data
-
2. How Is Big Data Used?
-
Big data for applications4m 41s
-
3. Big Data and Data Science
-
4. Ethics in Big Data
-
Big data and privacy5m 52s
-
Data governance6m 2s
-
-
5. Data Logistics
-
An evolving data landscape5m 48s
-
6. Analyzing Big Data
-
Visualizing big data5m 13s
-
Data mining4m 39s
-
Text analytics4m 18s
-
Sentiment analysis4m 48s
-
Predictive analytics4m 7s
-
Anomaly detection3m 59s
-
Conclusion
-
Next steps2m 56s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Challenges with data preparation