- [Instructor] Visualizations consist of two components. There is the space where the data is displayed, and then the objects that represent the data. So a lot on this course is going to be about the data itself, right? How do think about it, how to display it, how to make it compelling, etc. In this video, we're going to concentrate on the space where the data is displayed. We're mostly going to be looking at 2D planes here, although there is such a thing as 3D visualization also, but that's outside of the scope of this conversation. The point is that the size of that plane is very important.
The scale drastically affects the visualization and the story that it tells. So here we have a bar chart with some interesting and easy-to-see patterns, right? What's going on? A is doing well, B is doing pretty well, and the rest are doing less well. Here we have another chart. No one's doing that well in this one, right? A is sort of slightly better than the rest, but they're all kind of doing pretty poorly. It's also hard to see the variation between the groups, D, E, and F look identical here. In this chart, all of them but A and B are doing really pretty poorly.
Now maybe you noticed that these charts are actually all the same. If you look at the Y axis, you'll see that the data values are all consistent, it's just the scale that's changing. So in the left-hand chart, the scale goes from zero to 250, in the right-hand chart, it's from 100 to 240, and the middle is from zero to 2,000. And that dramatically affects your perception of the data that you're looking at. And this is really true with all charts. There's no way to display data visually and divorce it from the scale in which it's being displayed.
But you might ask, how you can know how to set the scale for a chart. Which one is right when you look at these three charts? Which is the correct way to do it? It sort of depends, but there are definitely some good rules to follow, and in this particular example, the one on the left is quote unquote, right, and by the end of this video, you'll probably agree. So, the first thing to consider is, am I using a chart type that requires a certain approach to scaling? And for the most part, the answer is no, but there are some chart types that do have specific rules to follow.
So, as an example, bar charts really should always start at zero. And here's why, the height of the bars in a bar chart actually means something. So if you look at the right-hand example, if you look at F, for instance, that data point looks like it has practically no data in it. But the fact is, we're already above 100 at this point. The left-hand example is an accurate representation of it. I can see that F has a bunch of data, it's above 100, it has less than A maybe, but that missing data in the right-hand example just isn't fair, it just doesn't accurately represent how much data is in that F category.
There are plenty of other chart types that don't have to start at zero, but bar charts really always should. The second question to consider is, am I comparing things in a self-contained context, within the context of the chart? So, in this example, let's say you have a chart that has no external reference, there's no need to put these numbers in the context of other numbers outside of the data that I'm actually displaying? So here, we're looking at widget sales for a company. And I actually generated this chart in Excel, and it's automatically exported it with a scale that started at zero, and went up to $12,000.
But as you can see, the numbers are only from 5500 to $10,000 or so. So since I don't need to compare this to any other numbers, and I'm in a bar chart, I don't need to show the numbers below 5500, I don't need to show the entire scale. I'm just showing a change over time. So I can actually set the scale here to 5,000 to 10,000 if I want to, right? I have a nice round top and bottom number on my graph, and I'm telling a complete story. Or, I could actually add the exact same value above and below the minimum and maximum values so that there's the exact same buffer above and below at the bottom and the top values in this chart.
So this is actually the most balanced visually, right? There's exactly the same number of pixels above and below the line. But it's kind of weirder numbers on the axis, right? 4472 to 10,472. I'm not a big fan of this approach. More importantly, research has shown that people are much better at remembering round numbers, so using random-seeming non-round numbers like these won't help your audience understand or remember your data. I could also set essentially an arbitrary scale. So let's say that I'm a sales manager at this company, and I have a sales target for my people of selling $20,000-worth of product, and I want to show these numbers in the context of that reference.
I'm manipulating the scale for a valid purpose. I'm not intending to influence the perception of the data for evil purposes, but within the valid context of this sales target, I'm just showing people where they stand. It's okay to change scales for good reasons, but you really do have to have a solid reason and you have to be consistent with that. Another influence on scale is whether or not you have an external reference, right? You have some arbitrary number outside of the context of the data that I'm referring to. So here we have the same chart, and this is very similar to the last example, where we were showing it in the context of an internal sales target.
Now sometimes you might want to set the scale based on an external target, right? This is not about showing my salespeople how they're doing compared to a target that I set, but it's more about an external number, right? This is the total widget sales and entire marketplace, so I want to show them within that context. It's a very similar motivation, it's a very similar thing, I'm just sort of setting context and comparing it to a number, but if it's external to you and external to your data, you have to do it that much more carefully and thoughtfully so that you avoid the perception of bias on your part.
And speaking of bias, that's always a great question to ask yourself. Am I being fair and unbiased, especially when thinking about scale, you have to think about this very, very carefully. You don't want to be that guy, right? You don't want to put on a suit and look all pretty and legitimate and then play games behind people's backs. No one likes a cheater. So it's a really good exercise when you're creating visualizations, to look at your chart with different scales. I would recommend that you experiment with scales, change it up, see how it looks, think about whether or not you're being accurate, think about whether there's a reason you're setting a particular scale if you are, and above all, look for bias, and eliminate bias whenever you can.
And really when in doubt, channel your audience. Think about a few things. One is think about people who don't know the data and don't know the story you're trying to tell, and make sure that your scale is going to help them understand your data and help them get the story that you're trying to communicate. But especially, think about your audience in terms of two categories. Think about your skeptics and your believers. When you set a scale, ask yourself are the skeptics going to believe this? Are they going to buy it, are they going to think that this is an honest representation of the data? And the same thing on the believer's side.
Make sure that when you present data in a certain scale, you're not just reinforcing the believers, you're not just giving them the data story that they want, make sure that it's a really valid, accurate representation of the data that you're sharing. Use your powers of scaling for good, not evil.
- Describe the process by which individuals’ interests are incorporated into data visualizations.
- Differentiate the use of the Ws in data visualization.
- Explain techniques involved in defining your narrative when visualizing data.
- Identify the factors that make data visualizations relatable to an audience’s interests and needs.
- Review the appropriate use of charts in data visualizations.
- Define the process involved in applying interactivity to data visualizations.