Join Lynn Langit for an in-depth discussion in this video What is a NoSQL database?, part of NoSQL for SQL Professionals.
- So by this point you're probably wondering what is a NoSQL Database? In order to understand that, you need to understand a little bit about the history of where the NoSQL movement movement, if you will, came from. It really came from the big web players, in particular, Amazon and Google, and what they did in terms of data storage, is they created new types of databases for web scale. It really was driven by the V's of big data that we've talked about. The volume, the velocity, and variety because they were working with huge volumes of semi-structured data delivered at a very high velocity.
Another aspect of NoSQL that we'll talk about throughout this course, is the fact that Amazon and Google both published the results of their creations as papers that were then used by the open source community to make a lot of the NoSQL products that we'll be talking about. Interestingly, a little bit of trivia, the name NoSQL started at a meetup in San Francisco, great San Francisco area, when a person who was interested in understanding more about these new types of databases, created a hashtag so that people could follow this meetup on Twitter.
So NoSQL is really more a set of characteristics than a particular defined thing, and it's a set of database solutions that are designed to host and work with non-relational data at its core. Now they can store relational data but they're really designed for non-relational, so either no structure or semi-structure. And databases that focus on scalability. So they can be sharded or partitioned or done so automatically they can scale very, very hugely and they can hold terabytes or even petabytes of data with no problem whatsoever.
Now that being said, in order to work with non-relationable and highly scalable data they do give up some of the characteristics that are found in relational databases. Most notably, transactional consistency of the data. So the data, because it's scaled can be in an inconsistent state. This is a really important consideration that we'll be talking about through this course and we'll color your thinking in when and where to use NoSQL databases. So to consider what we've talked about so far, in the current or past landscape we really had a couple of choices around Physical and Logical database storage.
So for Physical we either had our servers on premise or they were hosted somewhere. And for Logical, we picked from relational database storage that was by default OLTP or online transactional for our data that was inserted, updated, and deleted and then optionally we had an OLAP store which could be a copy of that database that was structured in a way for read optimization or it could be an entirely different product, such as analysis services for SQL server or a data warehouse for Oracle, or so on and so forth.
Those were our set of choices. Now we've got a more complex set of choices that we'll be exploring throughout the rest of this course. First, from a Physical standpoint, our databases can be hosted on premise, in the Cloud, or what's called Hybrid, which is a private Cloud. In addition to the physical choices, we have logical choices around relational databases, NoSQL databases, Hadoop, and the file system, and we're going to be exploring all these different logical choices coming up next.
So you might be thinking, wow this is a lot of new information, a lot of new choices, why would I want to change? This is my database, this runs my business. What would cause me to want to make this kind of change? Well really the answer is around what I call successful, "Small" Big Data projects. That's where most of my clients, there's some exceptions around start-ups, but most of my clients really start in the "Small" Big Data realm. And from a practical point, they take their transactional data, they take a look at it, they understand if it's got integrity, if it's a good basis upon which to build some more data so that they can ask new kinds of questions and often they'll add some kind of behavioral data.
So one of the very concrete things that I end up doing in working with customers around NoSQL is trying to figure out what behavioral data is going to be relevant to the particular use case and then understanding the size of it, and then looking at the capacities of their current systems, and the cost to increase the capacities. You may also have interest in Public data and Premium data, as we discussed previously. So these are the four different areas of data. So this leads us to some NoSQL Architecture Questions that I hear commonly.
The first one is, "Should we even look at Hadoop or NoSQL?" And again coming up here shortly, I'll be talking about what's the difference between Hadoop or NoSQL. A lot of people think they're the same thing and they're in fact, actually quite different things. They're alternatives to relational databases but not the same thing. The classic, "How much data is big data?" And the consultant answer is, it depends, and it really comes down to what are the capacities of your current systems? Not only the physical and logical capacities but what are also the capacities in terms of your staff or your people? Do you have one DBA? Do you have ten DBAs, so on and so forth.
So we'll talk about that from a practical point as we look at NoSQL solutions. What are the limits of SQL Server, or Oracle, or DB2, or whatever relational system you have? It could even be MySQL and how will the new data that I want to bring in, effect the limits and the performance of my current data on my SQL server or whatever relational system I have? Are there relational alternatives to Oracle or SQL Server? For example, some clients that originally thought they were interested in NoSQL, ended up getting MySQL which is an open sourced version of a relational database because they really just wanted a cheaper relational database.
They really didn't have a need, based on volume, velocity, and variety for a NoSQL Architecture or solution. Which NoSQL database (if any) should we consider? And this is going to be a big part of this course as we get into it. There are different kinds of NoSQL databases with different capabilities and understanding and matching your business question and data to the particular NoSQL database is really important. And how safe is the cloud? Again, I'm going to show you some cloud examples so you can get a sense of working with the cloud, so you can try it out yourself. How do we mine the data for usable information? A commonly overlooked question in NoSQL Architectures and Big Data Architectures is okay, great, we've got a place to put the data, we've figured out the destination, if it's Hadoop, NoSQL, relational, whatever, but there's not enough consideration given to how do we get the information out? How do we query, which query languages, which types of aggregations or mathematical operations do we need to perform on the data so that we can get meaningful information out? This is really a critical question.
It's not only the query language but it's also the visualizations. So I'll talk about this near the end of this course as well. I have found there are five key points to getting actual business value from NoSQL databases, and here they are. Number one, formulate business questions that are relevant and important to your particular business problems and base your entire project around the need to answer business questions. Number two, select the NoSQL solution and set it up. It's one thing to read a vendor's literature it's quite another to get your engineers' hands on NoSQL solutions.
Sometimes, what is being said about these NoSQL products, does not match your reality. So you need to try them out. Number three, find, load, and clean all your source data and this is a non-trivial exercise. In the old world of OLAP or reporting, setting up the OLAP projects would involve 75 percent initially just cleaning up the data and it's actually exponentially more in these NoSQL projects because you're bringing in huge volumes of data that has less structure than you're used to.
So do not underestimate the importance of the cost associated to this and the way that you mitigate that is by trying it out with small sample sets and understanding the cost at a small scale and then forecasting it to the whole set. Query the data. Understand the query patterns, as we just talked about and present the data. If you can set up a loop, kind of around the idea of lean start-up or minimum viable product, in this case, minimum viable report, so that you can get feedback from the business decision makers early in the cycle, you're going to have a greater chance of success with your NoSQL data project.
- What is NoSQL?
- What is Hadoop?
- Exploring Redis, HBase, MongoDB, and Neo4j
- Exploring NoSQL features in Microsoft SQL Server
- Working with NoSQL data in the cloud
- Applying NoSQL choices to business scenarios
- Considering how data will be input and output at your business