Join Lynn Langit for an in-depth discussion in this video The 5 categories of NoSQL databases, part of NoSQL for SQL Professionals.
- As I mentioned previously, there are over 150 different kinds of NoSQL databases, and it's literally impossible to understand all the different implementations. So, a technique that I've found that helps me to understand the capabilities and to select the appropriate NoSQL database is to categorize. And these categories are commonly used in the industry, and everybody has their different flavor of them, so I'll take a minute to go through them at a high level and then we'll spend the rest of this section going through what they exactly mean, what types of NoSQL databases are found in these categories, and then I'll actually do a quick demo using a web framework for most of these, so there's no setup.
Just so you can see what it's like to interact with the different types of databases. So they're laid out in a particular order, so on the very left you have Key/Value Volatile or Persistent. And what that means is you have some key and some value. They're completely schemeless, and they're either in memory volatile, or persistent on disk. And really, in the old days, this was really not a database. This was a hash table. And the reason this has become a type of database, is because of big data. So, it's the V's in terms of the volume of data.
An example of this, is worked in Europe and somebody was talking about the healthcare system of an entire country. And they load up the customer ID and the customer name, in a volatile key value store for quick lookups. So it's just a really, really huge dictionary or lookup that can be sharded or spread across many, many servers. So in the center, we have Wide-Column. So, Wide-Column, we've talked about so far as an abstraction over Hadoop, which is H-based, is you have a key and then you have a column that has some values in it.
You'll notice in the example shown that in the value section, there is no common schema. It's not as if you have to have everything the same, or rectangular, or relational. Wide-Column means just that, the columns can vary in width. The next type of database is Document, and most commonly this is used with JSON data, and we talked about this in an earlier module, but an example is shown where you have text data that has a structure to it, it kind of looks like an XML structure in that it has identifiers and then values, and then you have multiple levels of nesting, and it uses curly braces and quotes for its delimiters.
Now that being said, there are other types of encodings, BSON is another one, binary, serialized object notation, as well as other Document encodings that are used, but JSON is the most common. And then Graph is a database that captures not only entities or objects, but it captures the relationships. So I call this the noun verb database, and so an example of this is a database that will capture social media information, so your friends and then how they're related to their friends and so on and so forth.
So you wanna Choose NoSQL When you're bringing in new data into your project - higher volume and/or variety are the two big drivers, and it'd be too expensive for you to buy more, for example, SQL server licences, just to store a whole bunch of JSON behavioral data in your SQL server as string data or XML or something. That's a very common use case. Another case would be that you need a Graph-type traversal, because your business question is around the relationships between the data nodes, and although you could do it in a relational or SQL server environment, it would be really arduous or cumbersome to write those queries, where it's very easy, as you'll see, in the world of Graph databases.
If your data is unstructured or semi-structured, again this goes to behavioral data, we were just talking about that. If your team is ready to train in the new technologies, this is particularly for the Enterprise, where you had Enterprise DBAs and Enterprise developers for years and years working with relational, you literally have to make room in your head, or unlearn all the techniques that you know, in order to successfully work with the NoSQL databases. The most tricky aspect that I've found is for developers to understand that eventual consistency means just that, it means no formal transactions.
It seems really, really tricky for developers to grasp that component. In terms of DBAs, one thing that I find that they struggle with is differences in how availability is managed, and how queries are tuned and optimized, across the NoSQL systems. And a tip that I have is to consider the use of the cloud to accelerate both your proof of concepts and your full implementations. If you do a POC in the cloud, the cloud is a much easier place to scale that POC into a full implementation.
I've had many calls from customers who have done a POC on premise, some developer got it to work on a Linux box, and they then wanted to go ahead and implement it into production, and they really didn't have the skills at the team level in order to do that, and if they had started in the cloud, it would just be a lot simpler to just scale it out. And finally, if you have enough information to correctly select the type and product of NoSQL for your situation. This is a very, very complex landscape, that's probably why you're listening to this course, and I spend the majority of my professional work helping my clients to figure out what will be the correct solution, and it's really not a cut and dried case, you have to iterate, you have to tinker, you have to build some minimum viables, you have to see what kind of data you actually have, what your business questions are, there's a lot of variables that go in.
So one tip that I have is, select a couple of candidate technologies, up to three, and then build out your POCs on different types of NoSQL databases, just to see what the actual implementation is like, how difficult it is, how you get your value, and usually you'll come to some sort of conclusion, after you actually build it out.
- What is NoSQL?
- What is Hadoop?
- Exploring Redis, HBase, MongoDB, and Neo4j
- Exploring NoSQL features in Microsoft SQL Server
- Working with NoSQL data in the cloud
- Applying NoSQL choices to business scenarios
- Considering how data will be input and output at your business