In the world of social network analysis, a clique is defined as a group of vertices where all possible links are present.
- [Narrator] If you survived your 7th through 12th grade years of secondary school, which are called junior high school and high school in the United States, you are no doubt familiar with cliques. In the world of social network analysis, a clique is defined as a group of vertices where all possible links are present. In this movie, I will show you how to identify and enumerate cliques in your network. The functions that I will use are in the igraph packet, so I need to load that in. I'll type library, left parentheses, igraph, followed by right parentheses, and enter.
And the igraph package is attached. That package was available to me, because I downloaded it from a CRAN site on the Internet. So, if you don't have igraph on your system, you can visit a CRAN site using techniques I showed you earlier in this course. The first thing I want to do is to create a random graph. So, I will assign that to the variable gnp. Then, a less than sign and hyphen to indicate assignment, and the function I'll use is sample_gnp(.
Then, I need to set the parameters for this creation. I will ask for 20 nodes, or vertices, in the network, comma. The probability of a link existing between any two pairs of vertices is 0.3, comma. We'll make this an undirected network. So, directed equals FALSE, with FALSE in capital letters, and then a comma, and I don't want to allow two links between a pair of vertices, so we'll say loops equals FALSE.
Then, a right parentheses and enter, and we have our graph. And to visualize it, I will type plot, and then in parentheses, gnp, the name of the graph and enter, and there we see it on the right. Because this graph was generated randomly, it is highly likely that the graph that you see will look very different. But that's fine, we can work with either. So, I'll go ahead and close the visualization. Now, I can analyze the cliques. The first thing I'll do is calculate the size of the largest clique, that is, the largest fully-connected subgroup within the graph.
So, for that, I'll type clique_num, N-U-M. Then, in parentheses, gnp, which is the name of my graph, and enter, and we see that the largest clique for this graph is three. That's unusual. Most cases when I run this particular algorithm, I'll get cliques of at least size four. And, in fact, why don't I go ahead and do that again? I will use the up arrow key, and I will rerun the gnp command with an enter.
And I'll rerun clique_num again, and enter. And there I see a result of four, which I think is more interesting. Next, let's say that I want to display the number of cliques that are of a particular size. So, let's say that I only want to see the cliques that are of size four at least. So, I will type cliques, just the word cliques, left parentheses, gnp, the name of my graph, comma, and then min equals four. Then, a right parentheses and enter, and I see that we have two cliques of four items, and you can see that both of them include vertices five, seven, and 10.
And the other two vertices are 17 and 20. And if we go back to see the graph, I can see that I have five, seven, 10, and 17 all interconnected. And then, also, five, 10, 17, and 20 all fully interconnected. And it's a little bit easier to visualize five, 10, 17, and 20, because that particular clique is toward the edge of the graph.
I'll go ahead and close the visualization. One thing to note about the cliques keyword is that you can set the minimum and the maximum. So, let's say that for cliques, for gnp, I want to see min equals three and max equals four, in parentheses, and enter. And then, I see that there are 49 cliques that meet that particular criteria. So, you can see here that we have three of 20 vertices all in a clique.
A lot of them involve 12 and 18, 13 and 18. And then, for four out of 20, we see the two items that we had before. And in double square brackets, you can see the count, so we have 48 and 49, which means that we have two cliques with four members. And for three, we have 47, so that's one through 47. And just to give you an idea as to the number of cliques that can occur when you have a highly-interconnected graph, I'm going to go back and create this graph again, but I will use a connection probability of .6.
So, I will do my up arrow key, and here we are with the gnp assignment command. And I've just changed the probability of any individual link occurring from .3 to .6. So, I'll press enter, and then I will plot it using plot(gnp), so we should see a very link-heavy graph. Yep, that is very highly interconnected. And now, if I take a look at the largest clique number, so I'll go back up to clique_num, enter, and we see that the largest clique is six.
Identifying cliques in the context of a randomly-generated network gives you some interesting information about how network density affects the number of cliques available. On the other hand, if you're looking at live data, such as through email correspondence and who emails whom, then the identification of cliques gives you ideas of which groups of people work together all the time, and perhaps to the exclusion of others.
- Formatting data
- Creating network graphs
- Measuring connectedness and betweenness of points
- Calculating networking density
- Describing network components
- Visualizing a network
- Writing a network description to a text file