Join Duane Nickull for an in-depth discussion in this video Examining basic graph modeling, part of Learning Neo4j.
In this chapter, we're going to start to learn about how to graph data, or more importantly, how to build data into graphs. It sounds like a fairly daunting exercise, but it's actually very simple. And it's simple because graphs are often constructed the way people think, not in the ways of the tabular data used in RDBMS systems. You simply start by building graphs as shown below. We have a Facebook type graph with two characters, Adam and Sarah.
We've drawn a line between Adam and Sarah, noting that they're friends. And we've drawn a line between Adam and another entity node called LOL Cat, and this is representing a picture that Adam has shared. Sarah has an arrow pointing at LOL Cat, because Sarah likes LOL Cat. And Sarah also has left a comment on LOL Cat. And she has commented on a node, represented by the one in the lower left hand side, with a label of funny.
And funny is linked to LOL Cat. So it's a comment on the photograph, or the picture. This can be easily turned into a formal graph. Here we see the same graph represented more formally. There is more to it than meets the eye though. So if you look at Adam and LOL Cat, Adam has shared a picture of LOL Cat. The problem here arises in the fact that there are more than one copy of the LOL Cat photograph on Facebook.
Because Adam has shared it, he's recreated the original in his own timeline on Facebook. When you see the line between Sarah and LOL Cat with a label of likes, you have to ask the question, what does this mean? Does it mean that Sarah likes all instances of LOL Cat, or does Sarah like just this one instance of LOL Cat? Inside of Facebook, if this happened, the Adam LOL Cat photo that he shared, would be a unique instance, and Sarah likes only the instance, not all copies of the LOL Cat photograph.
Likewise, Sarah's comment, the word funny, is also on the copy, this specific instance of LOL Cat. At this point, we can start to augment graphs further. This is yet another graph. It shows some actors and movies. Tom Hanks, and Hugo Weaving, at the top side, are both nodes representing actors. There are two movies, The Matrix, and the Cloud Atlas.
You can note here that Tom Hanks acted in the Cloud Atlas, while Hugo Weaving acted in both the Cloud Atlas and The Matrix. Lana Wachowski is a director represented by another node, on the lower side of this graph, and she has two relationships, that are outgoing relationships, pointing at The Matrix and the Cloud Atlas. And you'll note that the relationships are all asymmetrical at this point.
Hugo Weaving acted in The Matrix. It's important to note that The Matrix did not act in Hugo Weaving. Likewise, Lana Wachowski directed The Matrix and the Cloud Atlas, but those movies did not direct her. Modeling is easy when you look at it from this standpoint. Simply drawing memes, or mind maps, with arrows, can be a good start to modeling. The key here though, is to start modeling incrementally. Start with a very simple graph, and then start augmenting it.
Modeling incrementally allows you to ask questions, what sorts of information you will need for your application based on the types of queries. And this will help guide you in your modeling activities. If we revisit this graph again, and we wanted to know more information that was necessary to support our application, we can start adding properties to the nodes, and properties to the relationships. In this case, we hsve added quite a bit of information. We've added for the actors Tom Hanks and Hugo Weaving a property.
Properties and graph databases are represented as key value pairs. The key here, if you look at the Tom Hanks node in the upper left hand corner, is nationality, and the value is USA. There's another one, called won, which is a property key, and then there are two values, Oscar and Emmy. Likewise, Hugo Weaving in the upper right side, has a nationality key pair of nationality Australia, and he has won an MTV Movie Award.
The ACTED_IN relationships have also been augmented with roles as a property. The roles include the screen name, or the actor's stage name that they partook inside of the particular roles they acted in. So in the case of Hugo Weaving, he took on the role of Agent Smith in The Matrix. The Matrix has been augmented with the genre, in this case sci-fi.
And Lana Wachowski has also been augmented with the same key value pairs as the actors. It is important to note that this is completely schemaless. There is no need to declare a schema as there is in a relational database management system. The schema is actually defined by the data itself, leaving complete freedom to add titles or properties as you see fit. So, in the case of The Matrix, we could arbitrarily add the year released as a property and fill it in with the correct value.
We could also add if it had won any awards, or a number of other things, and there's no need to completely re-update the entire database. In fact, these properties can only be present in one node if necessary. This is a far cry from the world of RDBMS, where schemas must be rewritten for every addition. New for Neo4j 2.0, is the ability to add labels to nodes. In this case, Tom Hanks has been given two labels, Person and Actor.
Same for Hugo Weaving and Lana Wachowski. The movies have been given a label as a movie. This is a very important improvement on previous versions of Neo4j. It will not readily be apparent why, until we get into the actual database work, and start modeling and working with the data itself. In the next chapter, we're going to review the Neo4j browser console.
- What are NoSQL and graphic databases?
- Examining basic graph modeling
- Using Cypher to query Neo4j
- Using paths to traverse multiple nodes
- Getting properties back from paths
- Using specific nodes
- Handling conditions
- Creating entities
- Deleting entities