- [Instructor] So what is data visualization? It is data visualized. No fancy definitions here for me. But there's lots of things that go into good data visualizations but that's not the point of this movie. The point is to really just show some examples and try to set some boundaries for what we're talking about here. And you'll notice probably that the boundaries are pretty broad. I'm not going to constrain it too much. And you'll also note that this is not an exhaustive list. So let's start off with the poor lowly bar chart, right, one of the most basic and common forms.
It's oft maligned but it's arguably the best form of data visualization. If you have one variable to communicate and the emphasis is on the magnitude of the values across different categories, and the goal is probably to allow comparison, you really can't do much better. So in this example, we're looking at a sales performance chart and you could easily see that 2006 was the best year, 2008 was the worst year, 2009 didn't do so hot either, and then the other years are sort of in between those extremes.
It's easy to see the headline and the overall story here. In different chart forms, it might be hard to see the variations, for instance, between 2008 and 2009. The bar chart's great, it works fantastically. And what I would argue is that whenever you're doing visualization, you should almost always ask yourself, why should I not do a bar chart? That should be the first thing you think of. If you're looking at data over time, the line chart is a really great way to go as sort of by default. Or the timeline if you're doing an infographic, right, where it's more time-oriented content but not necessarily quote unquote data, right? If you're dealing with time, timeline, line chart, great defaults.
By the way, one of the advantages of the bar chart, the line chart, and the timeline, and some of these others we're going to talk about is that your audience knows them so well, which means they can immediately recognize them, they know how to read them, they won't be confused by them, and that's a good thing. Another very common example of a chart is the scatter plot and the bubble chart, which is really just a scatter plot with a third variable added. Right now, the bubble size is sort of a third variable. It's telling us a little bit more than just the scatter plot, which covers two variables.
There is the oft-reviled pie chart, radar charts and spider charts, tree maps which are an interesting way of showing hierarchical data, and we'll talk more about all of these in other movies. Stream graphs, matrices, these are all effective and interesting ways of showing and displaying data, and comparing things. Sometimes alternative forms really do help to tell a story. So this example is something created by Charles Minard in the 19th Century and what we're looking at here is the march of Napoleon's army onto Moscow and then the retreat from Moscow.
So if we start on the left, the tan line is his army and the thickness of the line shows you how many people were in the army, so 422,000 or so people. And as they march eastward, so we can see geographic information on here, you can see how the army starts to shrink and shrink and shrink, and there're couple of offshoots where some people go off in different directions. The army shrink, shrink, shrink, shrink, shrink, shrink, shrink, shrink, shrinks until they get to Moscow and they're a shadow of their former selves. And then they retreat and they come back, they shrink, shrink, shrink, shrink, shrink, shrink, shrink, shrink, shrink, shrinks more.
And notice when they're about two-thirds of the way back, during their retreat, and they hit the Berezina River, and look what happens there. Crossing that river, it was incredibly cold, which you can see down below where you look at the temperatures, and it was half yet again, half of their men. Again, back, back, back, back, back until you look at that thin line where at one point, right before the other offshooted people rejoined them, they were down to 4,000 people. Unbelievable losses, a great example of visualization telling an incredibly powerful story. Another example from the past, this is another 19th Century example, was this visualization created by Florence Nightingale.
Now of course, she's best known for being a caregiver to soldiers, the lady with the lamp. But she was also a medical statistician and a data visualizer. She invented this form, which was called the coxcomb. This one shows monthly mortality rates from the Crimean War due to sanitation and other causes in wartime hospitals. Florence Nightingale often used visualizations when presenting to Parliament when she was trying to affect changes in these conditions in hospitals during wartime. Maps, of course, a very common form of data visualization.
Showing geographic data, and also showing things like election maps and demographic trends and et cetera. A choropleth, this form that we're looking at here, is probably the most common one you see where regions are colored to represent data intensity in that particular region. So, the darker regions here mean that there's probably more of whatever it is being measured here, the lighter regions less. There's another category of visualizations that you might say are sort of useless, superfluous, what's the point, this is not serious data visualization.
Yeah, okay, you could certainly say that. But that doesn't mean it's not data visualization and it doesn't mean that they're not useful in their own way. So here's an example of one, this is called a massive map of hip hop monikers. This is data visualization. So let me just click in here. And what we're looking at is a very large data set of hip hop monikers, right? The, sort-of the names that hip hop artists go by. And you see the connections between the names and sort of the categorization of them. So for instance, here we're looking at hip hop monikers that are mineral in nature.
Goldie Loc, right, Goldielocks, Onyx, and you see this connection between that and the parent category, animal, vegetable, mineral. And if I go over here and I see this connection over here, animal, vegetable, mineral goes over to vegetable and I can see Blue Raspberry and Casey Veggies, et cetera. Interesting, not necessarily quote unquote useful. But very interesting and certainly data visualization. Another example of a visualization that again, you could sort of say what is this and why am I looking at it, what's the point? I actually personally love this one.
So this is a visualization of pi. There is a computer that ran for something like a year and calculated pi to ten trillion decimal places. Now this visualization took the first four million of them and laid them out, so all these little dots, every single individual dot represents one of those digits of pi. And so of course I can come here and just scroll down this entire list to get all four million. And as you can see over here in the legend, each color represents what number is represented.
I can roll over the list and see what numbers are beneath my mouse pointer, okay? So what is the point? It's kind of neat, it's kind of random. What am I looking at? Well, I can easily see based on this visualization that this looks like static. This looks completely random. And so while this is sort of a, huh, why is this, what is this for? It does reinforce that pi is random. It serves a purpose, and I think it's a really great and interesting way of doing it. So as I said, this has been a very limited definition of data visualization, mostly through example.
Hopefully it's a good start for thinking about some of those standard forms and also some ways people have thought outside of the bar chart when approaching more complex problems.
- Describe the process by which individuals’ interests are incorporated into data visualizations.
- Differentiate the use of the Ws in data visualization.
- Explain techniques involved in defining your narrative when visualizing data.
- Identify the factors that make data visualizations relatable to an audience’s interests and needs.
- Review the appropriate use of charts in data visualizations.
- Define the process involved in applying interactivity to data visualizations.