Join Bill Shander for an in-depth discussion in this video So, what is data visualization?, part of Learning Data Visualization.
So what is data visualization? It is data visualized. No fancy definitions here for me. But, there's lots of things that go into good data visualizations, but that's not the point of this movie. The point is to really just show some examples and try to set some boundaries for what we're talking about here. And you'll notice probably, that the batteries are pretty broad. I'm not going to constrain it to much. And you'll also note that this is not an exhaustive list. So let's start off, with the poor, lowly bar chart, right? One of the most basic and common forms.
It's off, maligned but it's arguably the best form of data visualization. If you have two variables to communicate, you really can't do much better. So this example, we're looking at a sales performance chart and you can easily see that 2006 was the best year, 2008 was the worst year, 2009 didn't do so hot either, and then the other years were sort of in between those extremes. It's easy to see the headline and the overall story here. In different chart forms, it might be hard to see the variation, for instance, between 2008 and 2009.
The bar chart's great, it works fantastically, and I would argue whenever you're doing visualization, you should almost always ask yourself, why should I not do a bar chart? That should be the first thing you think of. If you're looking at data over time, the line chart is a really great way to go, sort of by default. Or the timeline, if you're doing an infographic, right? Where it's more time oriented content but not necessarily, quote and quote data, right? If you're dealing with time, timeline, line chart, great defaults.
By the way one of the advantages of the bar chart, the line chart and the timeline and some of these others we're going to talk about, is that your audience knows them so well, which means they can immediately recognize them, they know how to read them, they won't be confused by them and that's a good thing. Another very common example of a chart is the scatter plot. And the bubble chart, which is really just a scatter plot with a third variable added. Right now the bubble size is sort of a third variable, it's telling us a little bit more than just the scatter plot, which covers two variables. There's the oft-reviled pie chart.
Radar charts and spider charts. Tree maps, which are an interesting way of showing hierarchical data. We'll talk more about all of these in other movies. Stream graphs, Matrices. These are all effective and interesting ways of showing and displaying data and comparing things. Sometimes alternative forms really do help to tell a story. So this example is something created by Charles Menard in the 19th century. And what we're looking at here is the match of Napoleon's army on to Moscow, and then they retreat from Moscow.
So if you start on the left, the thin line is his army, and the thickness of the line, shows you how many people were in the Army, so 422,000 or so people. And as they match eastward, so we can see geographic information on here, you can see how the army starts to shrink and shrink and shrink and there are a couple offshoots where some people go off in different directions. But the army shrink, shrink, shrink, shrink, shrink, shrink, shrink, shrink, shrinks until they get to Moscow, and they're a shadow of their former selves. And then they retreat. And they come back and they shrink, shrink, shrink, shrink, shrink, shrink, shrink, shrink, shrink, shrinks more.
And notice when they are about 2 3rds of the way back, during their retreat, they hit the Berezina River, and look what happens there. Crossing that river, it was incredibly cold, which you can see down below where you look at the temperatures, and it was half yet again, half of their men. Again, back, back, back, back, back until you look at that thin line where at one point, right before the other offshoot of people rejoin them, they were down to 4,000 people. Unbelievable losses a great example of visualization telling an incredibly powerful story. Another example from the past, this is another 19th century example, was this visualization created by Florence Nightingale.
Now of course she's best known for being a caregiver to soldiers, the lady with the lamp, but she's also known as a medical statistician and a data visualizer. She invented this form, which is called a Cox comb. This one shows monthly mortality rates from the Crimean War, due to sanitation and other causes in wartime hospitals. Florence Nightingale often used visualization when presenting to parliament, when she was trying to affect changes in these conditions in hospitals during war time. Maps of course, a very common form of data visualization showing geographic data, and also showing things like elections maps and demographic trends and et cetera.
The choropleth, this form that we're looking at here, is probably the most common on you see where regions or color to represent data intensity in that particular region. So, the darker regions here means that there is probably more of whatever it is being measured here, the lighter regions less. There is another category of visualizations that you might say are sort of useless, superfluous you know, what's the point? This is not serious data visualization and yeah, okay, you could certainly say that. But, that doesn't mean it's not data visualization, that it doesn't mean that they're not useful in their own way.
So here's an example of one. This is called the Massive Map of Hip-Hop Monikers. This is data visualization. So let me just click in here, and what we're looking at is a very large data set of hip-hop monikers, right? The sort of the, the names that hip-hop artists go by and you see the connections between the names and sort of the, the categorization of them. So for instance here we're looking at hip hop monikers that are mineral in nature, Goldie Lock right? Goldie Locks, Onyx, and you see this connection between that, and the parent category Animal Vegetable Mineral And if I go over here, and I see this connection over here, Animal Vegetable Mineral, goes over to Vegetable, and I can see Blue Raspberry and Casey Veggies, et cetera.
Interesting not necessarily the quote and quote useful but very interesting and certainly data visualization. Another example of a visualization that again you can sort of say, what is this and why am I looking at it? What's the point? I actually personally love this one. So this is a visualization of pie. There is a computer that run for something like a year, and calculated pi to ten trillion decimal places. Now, this visualization took the first four million of them and laid them out. And so all these little dots, every single individual dot represents one of those digits of pie.
And so, of course, I can come here and just scroll down this entire list to get all 4 million. And as you can see over here in the legend, each color represents what number is represented. And I can roll over the list and see what numbers are beneath my mouse pointer, okay? So, what is the point? It's kind of neat, it's kind of random, what am I looking at? Well, I can easily see based on this visualization, that this looks like static. This looks completely random. And so, while this is sort of a huh, why is this? What is this for? It does reinforce that pie, is random.
It serves a purpose. And I think it's a really great and interesting way of doing it. So as I said, this has been a very limited definition of data visualization, mostly through example. Hopefully it's a good start for thinking about some of those standard forms, and also some ways people have thought outside of the bar chart when approaching more complex problems.
- Assess an audience in order to create accurate, targeted visualizations.
- Classify the six Ws of a data set and narrow down to the two most important Ws in order to create a visualization.
- Calculate the index of a data set.
- Sketch or diagram a rough plan for a data visualization using analog techniques.
- Distinguish different parts of a data visualization's narrative using visual design techniques.
- Construct a chart or diagram to accurately represent a data set.
- Describe instances in which adding an interactive element to a visualization is appropriate.