Join Dan Sullivan for an in-depth discussion in this video Document data models, part of Advanced NoSQL for Data Science.
- We're now beginning our lessons on document database management systems. Let's review some of the most important features of these databases. Documents consists of a set of keys and values, like relational tables they can have optional indexes. This is useful for improving query performance. One especially useful feature, is that attributes can vary across documents. Also, values may be complex structures, which can help us with keeping related attributes together as we denormalize.
Collections are a set of zero or more documents. Now, normally each document has a unique ID, these can be auto-generated by the document database system or assigned by the developer. When we refer to a schema, we're talking about a schema that's inferred by looking at all of the documents and then creating a list of attributes that appear in at least one document. This way of thinking about attributes means we can add an attribute to the schema simply by creating a document with that attribute. Now, I should point out that this is both good and bad.
It's good because it allows us to add attributes as needed. It's bad, however, because if we make a spelling or typing mistake when entering an attribute name then we can unintentionally add a new attribute to the schema. We have to be careful to watch out for that. Documents are structured so that they may have complex structures that include arrays and embedded documents. This is useful when we are working with hierarchical organizations like orders, which have order items. Embedded documents are commonly used to support this kind of denormalization.
The course begins with an introduction to NoSQL, and then delves into the specifics of document, wide-column, and graph databases. Learn key details for performing data preparation, exploration, and extraction for each type of NoSQL database. Review case studies that show how to use various NoSQL databases with popular data science tools, including the document database MongoDB, the wide-column database Cassandra, and the graph database Neo4j.
- NoSQL compared to traditional relational databases
- Performing common data science tasks
- Preparing data with document databases
- Manipulating data in NoSQL
- Preparing, exploring, extracting, and model building
- Working with document, wide-column, and graph databases
- Reviewing case studies using MongoDB, Cassandra, and Neo4j