Join Duane Nickull for an in-depth discussion in this video An overview of NoSQL and graphic databases, part of Up and Running with Neo4j.
- View Offline
The first chapter is about getting started and understanding what a graph database is. We're going to go through and look at what a graph database is, the competitive landscape, and why we chose Neo4j to illustrate. And then we're going to look at the idea of NoSQL and other graph databases, and look at the evolution. A graph database is simply a database that has graphs stored as the internal structure. Graphs themselves are made up of nodes and relationships.
Unlike tabular data, stored in RDBMS systems, which typically is stored in tables containing rows and columns, graph databases reference nodes and these nodes have properties. Properties can include unique identifiers or simple properties about them. A node can represent just about anything in the real world. A node can represent a human being. It could represent a piece of property, or a financial amount.
The relationships connect the nodes, and they too have properties. It's important to understand that nodes are organized by relationships. When you query a graph database, once you select your graph, you can choose to traverse the paths between nodes based on the unique properties of the relationships. For a simple example of this, think of your social graph in Facebook. You have a node representing yourself, and there are other nodes representing your friends.
And they are organized by the relationships that connect the two. The relationships in this case would be IS_A_FRIEND_OF as a property. As you can see, in the year 2013 to 2014 alone, graph databases have become the preferred method for storing data within the NoSQL movement. In fact, graph databases have taken a giant leap just in the last quarter of 2013. As we can clearly see here, within the subcontext of all databases, searching only through graphs, Neo4j, represented by this top blue line, has clearly become the leader within the graph database category.
It continues to grow at a rate faster than most of the other graph databases, although some challengers seem to have strong uptakes, such as a ArangoDB. My personal opinion is thatNeo4j will remain the leader for quite some time to come. When you look at these, you may think that these are graphs, but they're not, they're charts. Neo4j does not store charts. The graph theory first started with Leonhard Euler. He was a mathematician who went out to solve the seven bridge problem of a town called Königsberg.
Königsberg had exactly seven bridges and four landmasses. The challenge was to start on a land mass, cross each bridge exactly once, and end up on the same land mass. Through studies, Euler quickly realized that the size and shape of the land masses, or the length of the bridge, or how long a person walked between bridges, was actually irrelevant. Only things that mattered were the nodes, i.e. the land masses, and the bridges, which are simply relationships between nodes.
This was the very beginning of graph theory. Simplifying the problem into nodes and relationships, there was evident that there was no solution to the problem. In fact, this problem has not been solved to this day and is deemed mathematically impossible. Neo4j, 300 years later, started with the idea of a graph database implemented in Java. Neo4j was founded by a group of members in Malmö, Sweden, with the community edition, which was free.
The software has evolved substantially since its inception. It's now available in enterprise strength, which means that there are high availability, or HA clusters. The company is now headquartered in San Mateo, California. Neo4j has now matured to version 2.0. In fact, as of this slide deck, there is a 2.01 release that's available. It's very easy to install, as you'll see in a bit. The earlier versions required more knowledge to install, but the simplification of the install mechanism has made it so easy that just about anyone can install Neo4j.
Cypher, the query language, is also now very mature and, in a way, rivals SQL. The performance for some queries cannot be matched by SQL and RDBMS systems. Cypher can traverses millions of nodes very, very quickly, something that would take forever using tabular data if you had to find a row in a table and join it to several other tables to traverse a path. Nodes, relationships, labels, and properties are absolutely ideal for many data models.
- What are NoSQL and graphic databases?
- Examining basic graph modeling
- Using Cypher to query Neo4j
- Using paths to traverse multiple nodes
- Getting properties back from paths
- Using specific nodes
- Handling conditions
- Creating entities
- Deleting entities