- [Instructor] Now that we can connect to Neo4j from Python, let's start adding data to our graph. The first thing that we'll do is execute a create constraint statement to create a uniqueness constraint on the name property on the character label. This will assert that name is unique and we are not able to create any duplicate character nodes. Now let's write a cipher query to create characters. Since we want to avoid creating duplicates, we'll use the merge statement instead of create.
So we'll say merge on a node C with the label character. Now I'm specifying the name property. Previously we used load CSV and we had a row object that we could reference to retrieve values from. However, this time we won't be using load CSV. Instead, well be passing parameterized cipher queries. In cipher we specify a parameter with $ and then the name of the parameter, in this case name.
Let's execute this cell, and now this next block we create a new session object, so a connection to Neo4j server, we open characters.txt, which is our list of characters from A Midsummer Night's Dream. And for each line for each character we want to strip that line of white space and then execute our create character query passing in the name parameter for that line.
We can use the strip string function in Python to remove white space from the line, and now let's execute our create_character_query passing in an object for our parameters. In this case we have one parameter that's called name. That will be the key for this object or Python dictionary. And the value will be text, which is the line that we've read from the file stripped of white space, or the character name.
So let's go ahead and execute that statement. Now we can jump back to Neo4j to verify that we've actually created data. We can click the database slide out. We can see we have character labels. If we click those we see we've sampled 25 characters. However, we don't have any relationships, so let's start adding some lines from play. So our task for the create operation is for each line in the text to create a node in the database.
We want to keep track of the character speaking. In Neo4j that's represented with a relationship from the character node to the line node. And we'll store the line number as a property on the line node, as well as a property that contains the phrase itself. And again, we'll want to strip any white space from that line. So the first step is to write a cipher query that takes as a parameter the line, both the text and the line number, and then creates that in the graph.
We can use a create statement here. Since we're iterating over this file once we don't really need to worry about creating duplicate lines. The text property we'll pass in as a parameter. Let's call that line_text. And we also need a number property that will pass in the parameter named line_num. We'll use a with statement to just specify that we're bringing through this line node.
Just a requirement to use a with in between create and match in cipher. So once we've created the line we want to match on the character who's speaking this line. So we need to keep track somewhere of the current speaking character. We'll pass that name in with another parameter named current_character. And create a relationship. This is our speaks relationship connecting the character to the line.
Let's execute this cell to define that query. Now the next piece that I have here is not essential to our task, so I've already filled it out, but let's discuss what this is doing. We're going to need to keep track of a set of all of the individual characters that show up. Now we have the text file that contains the list of characters. I chose to read these back from the database just as a chance for us to see one more example of executing a query against Neo4j.
So characters we define as an empty set. We then define a simple cipher query for returning all character names. And then for each character that we find we just simply add it to this set of characters. And so we can see the set object is now in scope. That contains an entry for each unique character. Let's import the time package. Part of the requirement of this problem is that we keep track of the operation time for each operation.
Time allows us to take a snapshot of the timestamp both at the beginning and the end of each task allowing us to keep performance timing. So now we're ready to read Midsummer_Nights_Dream.txt. We'll keep track of the current character. This will be the character that is currently speaking. Each character may speak multiple lines, so we'll be sure to update that as we iterate through each line if we see that a new character is speaking.
So the first thing that we want to do is use the strip string method to strip any white space from the line, and then we need to determine if this is a new character speaking. Now in this text file, a new character speaking is signified simply by the name of that character on the line. So if the line shows up in our set of characters, well, that means that this is a new character speaking.
So we need to update the current character. Set that equal to the line stripped of white space, which will just be the character name that's currently speaking. And then we don't do anything else in that case. If this is not a change in character speaking, if this is an actual line that's spoken by some character, then we need to increment the counter that's keeping track of the number of lines and then execute our cipher query to add the line to the database.
Now we need to be sure to pass in our parameters. We have a line_text parameter that's going to be the text of the line. We have a line_num parameter. That's the line number that we've been incrementing. And then we also need to pass in the current character, which is the character that's speaking, this line. Now we also grab the timestamp for the end.
This is after we've written all of these lines to the database. And we compute the total time that this create operation took and divide that by the number of line numbers times a thousand gives us the number of milliseconds per line that it takes to write to the database. Let's go ahead and execute this. We can see that it took .36 milliseconds per line to write this to the database. Let's jump back over to Neo4j to verify that this data was actually written to the database.
Now we have a line node label. Let's just click on Line to give us 25 lines at random. And we can double click on these lines to see which character spoke them. So this line is Line 20. The text is, "Awake the pert and nimble spirit of mirth," spoken by Theseus. We can double click on Theseus to see all of his lines that he's spoken throughout the play. So this data looks correct. Let's go back to our Jupyter notebook and continue on with the problem.
- Strengths and weaknesses of Neo4j
- Data modeling
- Creating a database
- Joining data sets
- Modeling data for joining data sets
- Searching a database
- CRUD operations
- Performing calculations