Learn how to simulate a social network.
- [Instructor] In this demonstration, I'm going to show you how to simulate a social network in three easy steps. Those are, one: to generate a graph object and edgelist, two: assign attributes to graph nodes, and three: visualize the network. In the next demonstration, I'm going to show you how to analyze a social network for insights. Now I want to show you how to use a graph generator to simulate a social network. In the process, you'll learn to assign attributes to graph nodes, generate important network statistics, and visualize results.
So in this demonstration, you'll need Numpy and Pandas. We're also going to import networkx as nx, and bring in our matplotlib and seaborn. Let's set the plot parameters for this Jupyter notebook. We'll run these. So now our notebook is set up for graphing. Let's use the gn graph function to generate a directional graph with seven nodes. We'll call it DG, and we'll say nx.gn_graph(7) and set our seed to 25.
Next we need to generate an edgelist. We'll use the generate_edgelist function to do that. The generate_edgelist function returns lines of data in edgelist format. Here we use a for loop to pass through and print each line in the DG graph object. So we'll say for line in nx.generate_edgelist, then we'll pass in our DG graph, then parameter data=False, and for each line, we print the line.
print(line). We'll run this. So you see that the generate_edgelist function returned a string object that defines the lines that generated for the graph. In this example, the function generated an edge from node 1 to 0, from node 2 to 0, from node 3 to node 2, and so on. You can attach attributes to graph nodes or edges. Attachable attributes include weights, labels, colors, or other python objects. In this example, I'm going to show you how to attach label attributes to the node of the DG graph we just created.
As of now, if you select and print node 0, you'll see that it's just an empty attribute dictionary. We'll try that out. DG.node, to access the node, and we'll select node 0. And there's nothing in the dictionary. But what we can do is directly manipulate the attribute dictionary by selecting the node and attaching a category label to it. So let's try that. DG.node. We selected the node. And then we're going to select node 0, and we're going to attach an attribute name.
Call it 'name'. And then we'll set that name equal to Alice. Let's reprint node 0 and see what we get. We'll say print DG.node(0). You can see that it's an attribute dictionary that contains one key value pair, where name is the attribute key and Alice is the attribute value. Now let's attach a name value for each node in the DG object. We'll say DG.node...
So we're going to just fill in the rest of the nodes with names, so I'm going to copy and paste this down. So for node 1, we want to call him Bob. Node 2 will be Claire. Dennis. Esther. Frank. And node 6 will be George. Okay, we'll get rid of this one, we have no more nodes. Run this. It's going to populate our name attribute. You can always add attributes to nodes by calling the add_nodes_from function and passing in a nested dictionary of key value pairs, one for each node.
Let's do that here. So we'll write the name of our object DG, and we'll say .add_nodes_from, and then we're going to pass in a list of dictionaries. So let me just set up the syntax on that. And this is going to be the age attribute, so we'll define that as the key for our dictionaries. I'll copy and paste that in. So for node 0, we want to assign an age of 25.
For node 1, we're going to assign an age of 31. For node 2, we're going to assign an age of 18. For node 3, 47. For node 4, 22. For node 5, 23. And node 6, 50. We'll get rid of this there comma. Bump this down, just so we can read it.
And then we'll go ahead and print out node 0. So we'll say print DG.node(0). When we select and print node 0, you can see that the age attribute has been attached in its assigned a value of 25. Very good. Lastly, let's attach a gender attribute to each node in our network, and then populate it with values. So let's write the name of our object and then access its nodes, and I'm just going to set up this syntax real quick.
This is going to be an attribute called gender. So I'll get that, and then I'm going to copy and paste this down for efficiency. So for node 0, we're going to have, that's going to be a female. 1 is male. 2 is female. 3 is male. 4 is female. 5 is male. And 6 is male. Okay now it's time to take a look at the synthetic network we've created We'll do that by calling the draw_circular function on our DG object.
So we'll say nx.draw_circular(DG), and let's also use the node_color='bisque' with_labels=True, and print that out. The deprecation warning doesn't affect our analysis, so we're safe with just ignoring that. So great. We have a directional graph, and we can see that node 1, 2, and 5, all have relationships that flow to node 0.
In Twitter terms, you'd say that node 1, 2, and 5 follow node zero but node zero doesn't follow back. But discussing vertex objects by node is sort of esoteric. It would make more sense to refer to them as people or profiles. To label the graph drawings with label names instead of by node number, we can send a dictionary as an argument to the nx.draw function. We'll start by creating an empty dictionary called labeldict. So we'll say labeldict =, and then we're going to pass in a series of key value pairs.
So key value pair 0 is going to be Alice. For node 1, you're going to have Bob. Node 2 is going to be Claire. Node 3 will be Dennis. Node 4 is Esther. Node 5 is Frank. And node 6 is George. The next thing we need to do is pass a labels=labeldict argument into the nx.draw function and plot the graph out with labels=True.
So let's just copy and paste our drawing function from above. And we'll just add an extra argument here where labels=labeldict, and run that. And this graph is much easier to interpret. It appears that Alice is the most popular person in this network. To transform our directed graph to an undirected graph, all we have to do is call the to_directed method off of the DG object.
I'll show you now. We'll call the output G. Write the name of our directed graph, DG, and then to_directed, and then let's draw this graph out. We'll use the draw_spectral function. So to do that, you say nx.draw_spectral, and we'll pass in our object G with labels. So labels=labeldict, node_color='bisque', and with_labels=True.
Now you can see that our social network graph has been converted into an undirected graph. Let's look at graph G though. Because it's undirected, each of the edges appears to be multi-dimensional. In fact, the above plot could actually be interpreted as Alice following Bob, Frank, and Claire, with no follow backs. This could potentially make her the least important node in the network instead of the most important one. See why dimensionality is fundamental to social graphs? There are a bunch of other metrics that are important for analyzing social networks, and you're going to see those next.
- Getting started with Jupyter Notebooks
- Visualizing data: basic charts, time series, and statistical plots
- Preparing for analysis: treating missing values and data transformation
- Data analysis basics: arithmetic, summary statistics, and correlation analysis
- Outlier analysis: univariate, multivariate, and linear projection methods
- Introduction to machine learning
- Basic machine learning methods: linear and logistic regression, Naïve Bayes
- Reducing dataset dimensionality with PCA
- Clustering and classification: k-means, hierarchical, and k-NN
- Simulating a social network with NetworkX
- Creating Plot.ly charts
- Scraping the web with Beautiful Soup