- [Instructor] Data visualization is often about focusing on showing the connections between and the hierarchy of objects. This unique category requires a different way of thinking about and displaying this in useful understandable and meaningful ways. This video's going to look at a few types of visual approaches to this type of data. So I'm going to be using a bunch of examples from the D3 website. If you're in data visualization, you may already be aware of D3, and if not, you very soon probably will be.
The first one we're going to look at here is called a tree diagram. So it's sort of the default form for doing hierarchical data. So whether you're thinking about an organization chart or anything where you have a parent object which has children and maybe those children have more children, et cetera, it's just sort of a very easy understandable, and simple way to think about displaying this hierarchy of data. It's sort of like the bar chart for hierarchical data. So when you have data and you want to show the connections between the different objects and maybe the hierarchy of the objects, the default is going to be a tree diagram.
It's sort of where you can start thinking about it at least and then you can experiment from there. You can also use interactivity to bring it to life, so for instance, if I click on these nodes, I can collapse it all. I can get the entire thing to collapse into a much smaller form and then some of these are clickable to open up more details. Another form to show more complex data where there's a lot of connections in between objects including connections from one object across nodes to other objects is called a node-link diagram.
Like a tree, these have objects, the nodes, these little circles, and then links which are the lines connecting the nodes. Like a tree diagram, it's very easy and simple to understand. But like other hierarchical data forms, you can get into trouble quickly. The default in these types of forms is to show everything. I'm trying to show you data about all of this stuff. So by default, I want you to see all the connections, but when there's a lot of data like this, it can be difficult to parse, difficult to understand, difficult to see how this little dot over here relates to this dot over here unless I very carefully follow the chain.
So once again, interactivity can help solve those problems. So if in this particular example, I can click on these dots and collapse them. And if I do that enough times, I'm going to get way less detail to look at. And again, if I want to open up more detail, of course, I can do so. As you can see, this diagram, sort of floats around and naturally finds itself in space to help reduce the number of lines that cross each other which is one of the other problems with this form is that as lines cross and more detail gets obscured, it gets even harder to read what's going on.
This is called a force-directed layout when the graphic sort of settles in a position to best show what all the data is. Interactivity can also help in other ways when you have node-link diagrams where there's a lot of information, and it's a little hard to see. So in this example, it's using what's called fisheye distortion where as I roll the mouse over the data it sort of brings the focus, it sort of zooms in a little bit on it, again, making it a little bit easier to see some of the details in between the data points.
Another visualization method that I like for showing both the forest and the trees in terms of, how to look at the data, both details as well as big picture view, is something called an adjacency matrix. So in this case, this is a 2D display of many, many data points. So each column is one variable, and each row is the same variable. So sort of like in the alternative charts movie before this one, we showed the plot matrix, the scatterplot matrix.
Here you have variables compared against each other where, of course, you will always see a correlation. And then this variable, Child2, is then shown compared to the other variables. In this case, we're looking at cast members of Les Miserables and when they're on stage at the same time. So Child2 was only on stage with two other characters. Other characters in the play, of course, were on stage a lot more. This character, of course, is on stage with a lot more characters. In this particular data form, one of the most important things about it is the sorting.
Sorting really plays a big role in how you see data using this form. So for instance, right now, we're sorting by cluster, the order of list is by cluster. And that's sort of a way of grouping these characters. But if I group them by frequency, meaning the more frequent ones are at the top, the ones who appear most frequently, and the least frequent characters appear on the bottom, then I get a different view of the data. I'll see different patterns in the data. Or if I view just by name alphabetically, I'll see different patterns yet again. So this is a very interesting way of revealing patterns in data that you might not see using other data forms.
One of my favorite visualization methods for hierarchical data is called the tree map. This isn't so much about the connections between items as it is about the hierarchy only of the items. So this form's also very conducive to interactivity. So over here in this orange category, I can see a bunch of items that live within this category and how much relative space they take up in the entire dataset. And then I can click into this to zoom in and see more details. Sometimes you can click and click and click and go deeper and deeper into this type of visual display.
Another form that you see more and more of is called the core diagram. So you have a circular display along the outside of the display is each variable. What we're looking at here is Uber rides by neighborhood around San Francisco. So rides to and from the financial district using Uber cars. One of the things about this form that makes it popular these days is that nature loves a curved line, right? There are no straight lines in nature. And so, we're attracted to this type of display naturally, but again, sort of like in pie charts, circular lines can be harder to parse.
Now, I can easily see, for instance, that in the Financial District because of the thickness of the lines there are a lot of rides between the Financial District and South of Market. But it is a little hard to parse some of the other things about this data because of the curving nature of the line. So you have to use this form with caution. Also with this type of display, you can have an issue with an overwhelming amount of data, and that's where interactivity, once again, can really help. Because if I roll over each data point, I can just filter out those lines and much more easily see the details that I want to see.
Finally, another old standard is the Venn diagram. And so this is hierarchical data, right? It's about how many things fall into this big red circle? How many things fall into this blue circle or the green circle? And where they overlap, of course, those are sort of, you could think of as child objects of those combined datasets. This is a three set diagram, right? I have three datasets, and I see where they overlap. I recently saw a seven-set Venn diagram which was really interesting to look at but incredibly hard to understand what I was looking at.
But it was very fascinating. The Venn diagram is sort of like a bar chart. It's kind of a norm. Everyone is familiar with it. People understand what it means when they look at it. It's so common nowadays, of course, that it's even used as a form of comedy. (laughs) You can do funny little Venn diagrams. You probably see these on Facebook all the time. Like a pie chart, this form is really good at showing the big picture, where things overlap, but not necessarily the specific data, how much the overlap is.
People spend entire careers thinking about how to visualize hierarchical and relational data. I hope this was a good introduction to some of the basic forms. As always, I recommend that you start simple. Use interactivity to make the overwhelming less overwhelming and experiment with different forms. Play with your data, try these different ways of looking at it in hierarchical forms and just find the one that works best for you.
- Describe the process by which individuals’ interests are incorporated into data visualizations.
- Differentiate the use of the Ws in data visualization.
- Explain techniques involved in defining your narrative when visualizing data.
- Identify the factors that make data visualizations relatable to an audience’s interests and needs.
- Review the appropriate use of charts in data visualizations.
- Define the process involved in applying interactivity to data visualizations.