Join Dan Sullivan for an in-depth discussion in this video Preparing data, part of Advanced NoSQL for Data Science.
- [Instructor] A common task in data science…is preparing data.…Let's review the core steps in data preparation.…Collecting data is the first step of data preparation.…We typically work with multiple data sets…from different source systems.…One factor to keep in mind is that different source systems…will update data with different frequencies.…So when you're planning a data science project,…be sure to consider how frequently to update the data…if you're storing it in a NoSQL database…for further analysis.…Also keep in mind business rules…about consolidating or linking data.…
Rules could be simple such as dropping…an older version of records…or complicated such as a series of ephemerals…specifying how to decide when to link records…from different source systems.…Another part of data collection…that is easily overlooked is data retention.…After some amount of time, data is no longer useful…or it should be deleted to comply with business policies.…So you want to consider how you'll enforce…data retention policies in your NoSQL database.…
Author
Released
2/14/2017The course begins with an introduction to NoSQL, and then delves into the specifics of document, wide-column, and graph databases. Learn key details for performing data preparation, exploration, and extraction for each type of NoSQL database. Review case studies that show how to use various NoSQL databases with popular data science tools, including the document database MongoDB, the wide-column database Cassandra, and the graph database Neo4j.
- NoSQL compared to traditional relational databases
- Performing common data science tasks
- Preparing data with document databases
- Manipulating data in NoSQL
- Preparing, exploring, extracting, and model building
- Working with document, wide-column, and graph databases
- Reviewing case studies using MongoDB, Cassandra, and Neo4j
Skill Level Advanced
Duration
Views
Related Courses
-
NoSQL for SQL Professionals
with Lynn Langit3h 26m Intermediate -
Learning NoSQL Databases
with Joseph LeBlanc56m 53s Intermediate -
Learning R
with Barton Poulson2h 25m Beginner -
Business Analytics: Prescriptive Analytics
with Alan Simon2h 40m Intermediate
-
Introduction
-
Welcome37s
-
Exercise files39s
-
-
1. Why NoSQL?
-
Types of NoSQL databases2m 20s
-
2. Perform Common Data Science Tasks with NoSQL Databases
-
Exploring data3m 30s
-
Building models3m 21s
-
Applying models2m 6s
-
3. Document Databases for Data Science
-
Document data models1m 35s
-
JSON structures1m 53s
-
Install Anaconda2m 4s
-
Install MongoDB3m 40s
-
Working with Jupyter2m 43s
-
Perform quality checks5m 43s
-
Data frames in MongoDB4m 48s
-
-
4. Wide-Column Databases for Data Science
-
Wide-column data models2m 56s
-
Install Cassandra1m 49s
-
Prepare data for Cassandra6m 26s
-
Load data into Cassandra4m 30s
-
Cassandra and Spark1m 29s
-
-
5. Graph Databases for Data Science
-
Graph data models1m 41s
-
Key graphi concepts2m 3s
-
Install Neo4j1m 9s
-
-
Conclusion
-
Next Steps35s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Preparing data