Join Duane Nickull for an in-depth discussion in this video Exploring Neo4j technology, part of Up and Running with Neo4j.
In this video, we're going to look at what makes Neo4j relatively unique. Afterwards, we're going to start by installing Neo4j. Neo4j is a graph database. And a graph database, to recap, organizes nodes with relationships. But there's a different part of it for querying. To query, you navigate traversals through the nodes and relationships. So, the navigation of a graph can be expressed in terms of starting with a node and then exploring what other nodes are connected to it.
This can be a very simple equation, such as finding a node that is connected to only one other node, or more complex, such as establishing a pattern of nodes and searching the entire graph database for the same pattern. You can also identify paths, and these paths will search for the path pattern through all nodes in the database. These can be ordered, symmetrical, asymmetrical, and other. Let's consider a very simple graph.
This graph contains three nodes. A graph database of a book. This could represent several books if there were other book notes. And two individual people, Alice and Bob. We can see that Alice has read the book on graph databases and Bob has also read the book on graph databases. Alice is a friend of Bob, but note that this is a directional relationship. It doesn't mean the reciprocal is necessarily true. There's a lot more complexity to this than actually meets the eye.
The question arises, is Bob aware that Alice has declared that Alice is a friend of Bob? Is the reciprocal true? In this case, it is. Bob is also a friend of Alice. In this case, this relationship would be called symmetrical. Alice is a friend of Bob. Bob is a friend of Alice. You can now augment the graphs with properties and the properties fulfill more of the information. The FRIEND_OF relationship we just mentioned has been since July 9, 2011, and this is reciprocated by Bob.
Notice that there is also an underscore between friend and of. Within a graph database, you cannot have spaces on property terms. The same is true for the HAS_READ relationship. And you'll note that the HAS_READ relationship has also a date as a property, and a rating. The rating in both cases is different. Bob only rated the book four out of five. Alice, on the other hand, has rated the book five out of five.
Alice read the book on October 3rd, 2013, while Bob read the book on March 2nd, 2013. Alice and Bob both have age properties and name properties as well. Before we just mentioned the name, but it's actually a property of the node. The node being represented by the rounded rectangles. The same is true for the book. The book has now been augmented with the title being Graph Databases, and the authors.
And in this case, there are two authors, Ian Robinson and Jim Webber. Consider now how a relational database system might reference a relationship between customers and accounts. You might think that this would just be a direct relationship and you could point an arrow or make a relationship between a customer on one table, and an account on the other table. But this is usually not the way it's done. Once again, we use Alice as an example. She is customer number 143, and Alice has three accounts.
One with a balance of $100, number 326. The second, number 725, with a balance of $632. And the third, account 981, with a balance of $212. We now construct the Join Table. The Join Table simply joins the Alice entity to the account entities. And you can see that there are three references.
Three entries in that tabular data represent the different relationships that Alice has with the account. Now you may ask why this is necessary. Consider the case of a joint account where Alice and Bob both had access to an account. That would require the Join Table to exist, customer accounts in this case. This can be simplified if we just look at the actual slices of the tables. In a graph database, however, we can get rid of the middle Join Table, and we can actually just reference the entities, Alice to each accounts.
Alice owns an account. If another entity, such as Bob, had access to the account, he could be represented as another node with a direct relationship to the account. The relationships can have a label of owns. Alice owns each account. This is another example of graph theory. So why and when would you want to use Neo4j over tabular data? One of the problems that tabular data has over very large data sets is the model of CAP.
The idea here is that CAP stands for consistency, availability, and partition tolerance. Consistency means that all data nodes or instances are in the same state. The availability is that the data-persistence mechanism must be able to formulate a response for every request, including things like two-phased commits, and receive a response signal to indicate that it's exceeded or failed from the writer side.
A Partition tolerance means that the data persistence must continually operate despite state-alignment-recovery activities being performed. Imagine the case here of a two-phased commit happening, and the two-phased commit fails. For a while, the partition tolerance would actually reflect the wrong state. With CAP, the CAP theorem provides that only two out of three of these can be optimized at any one time.
All three cannot be optimized. This is because two of them will be optimized at the expense of the other. You can have high consistency, but that would mean that the partition tolerance would likely suffer. Again, with the availability, the availability could be very optimal, yet things like two-phased commits could possibly introduce state misalignments in the data you're searching for. This is an example of where you would want to use a graph database.
A graph database easily mitigates this type of problem. In other cases where you'd want to use Neo4j is where you have large amounts of data that is connected with a disparate scheme of formation. Neo4j basically defines the schema in the database based on the data it stores. It is, in fact, possible that you can modify the schema dynamically by arbitrarily adding nodes and relationships to existing nodes without having to rewrite the schema as you would inside an RDBMS system.
If you're in a startup company and you're watching this, this is something you might want to consider highly. When you start with a database schema, it's very unlikely you're going to get it correct on the first pass. Starting with Neo4j, you won't have to pay attention to rewriting the database schema as often because you have a very flexible mechanism that can adapt, on an ad hoc basis, to reflect the data you want to store. The retrieval likewise means that you can easily write Cypher queries to access the data.
It's not only about the flexibility, but it's also about the speed at which traversals can happen. Basically, Neo4j, running on commodity hardware, can scale easily to the point where it can traverse 4 million nodes of data in a second. That's a very, very optimal number. So, who's using Neo4j? This is an example of a few companies that are using Neo4j. Consider the case of Adobe. Adobe's Creative Cloud is a subscription system to software.
Within a second of somebody logging on, Adobe's Creative Cloud must be able to determine what subscriptions they have, what features they have access to, and be able to reflect the screen that gives that person the correct set of choices. The same can be said for Deutsche Telekom. With Deutsche Telekom, when a subscriber makes a phone call, the reference to the person's individual account, the minutes they have left, and the types of things they can do must be almost instantaneous, typically, in the time it takes them to dial a number.
It has to be thorough, it has to be accurate, reflecting the current state, and it has to be returned to them within seconds. With Cisco, a solution had to be found where cases, solutions, articles, and messages could be continuously scraped for cross reference links. This is being represented in Neo4j as a graph. Neo4j uses the enterprise with a high availability cluster for this particular application, and the result is the customer can obtain help faster with decreased reliance on customer support.
Coming up next, we're going to start getting our hands dirty by installing Neo4j.
- What are NoSQL and graphic databases?
- Examining basic graph modeling
- Using Cypher to query Neo4j
- Using paths to traverse multiple nodes
- Getting properties back from paths
- Using specific nodes
- Handling conditions
- Creating entities
- Deleting entities