Join William Lyon for an in-depth discussion in this video Data modeling, part of Database Clinic: Neo4J.
- [Instructor] The task is to import a CSV file of California population prediction data into Neo4j. With any data import task, the first step is to define the data model that we'll use to represent the data. Now, unlike other databases that may use tables or documents to represent data, Neo4j uses a data model called the property graph. The property graph data model consists of nodes, relationships, and key value pair properties that can be stored on both nodes and relationships.
Nodes are the entities in our graph, think nouns. Relationships connect nodes, think verbs, such as person, authors, post, user, likes, post, and key value pair properties describe the nodes and relationships, think of these as adjectives. We'll use what's called a node label to define the type of node. So here we have user and post. These are sort of similar to a table in a relational world.
On the left is the CSV file of California population projections by county that I downloaded from the California open data portal. The process I like to use for graph data modeling is to first identify the entities in the data. These entities become the nodes in my data model. As we look at the spreadsheet that we're tasked with importing here, we can see we have some information about the county, like code and name. We also have information about the prediction such as the year, race, gender, and age and population.
Here we can see county is an obvious entity, so let's create a node with the label county to represent our counties in the spreadsheet. The other fields that we have here, such as race, gender, age and population are describing a prediction. Our next node label here will be population prediction, and a given population prediction, is for a specific county.
We'll use the relationship type for county and let's reverse our direction there. This is the basic data model that we'll use. We'll have node label for county, node label for population prediction with this for county relationship connecting them. Now the next part of our data modeling process is to define the node properties. These will describe our nodes and relationships. For county we see that we have a county code and a county name, so let's add those to our data model.
Code here is an integer, and name is a string, so we'll specify the data types. Finally, for population prediction, we can see that we have race which is a string, gender, age which is an integer, and population, also an integer. This is the basic graph data model that we'll use. Now that we've defined this, we can see how to import this data into Neo4j.
- Strengths and weaknesses of Neo4j
- Data modeling
- Creating a database
- Joining data sets
- Modeling data for joining data sets
- Searching a database
- CRUD operations
- Performing calculations