Join Bill Shander for an in-depth discussion in this video Scale, part of Data Visualization for Data Analysts.
- Data analysts, of all people, probably know the basic rules about how to think about scale and data visualization, but I'm going to repeat them here, because I've had some debates, even with very experienced data people about the most basic rule of them all. And the reason the rule is so important is because of those Gestalt Principles that we talked about earlier. So, to start I'm going to share a few charts, okay? So here's a chart. This is a company's sales figures for its six regions, let's say, okay? So A is doing definitely the best.
B is doing pretty well, and then C, D, E, and F, yeah, they're doing okay, but they're certainly not doing as well as the others. Here's another company and also has six regions, and boy, all six regions are doing pretty poorly, and yeah, I guess A is doing the best, but not so hot. And finally, we have this last company, and A is killing it, B is close behind, and the rest are doing poorly, especially F, who has barely sold anything, right? So you all are data analysts who are watching this course, and, of course, you all probably very quickly saw these are all the same chart, the same company, just with different scales, and, of course, the title of this presentation probably gave that away.
But, in any case, what's the problem here? The main problem is that rule, the most important rule of all, which is you must never, ever, ever, ever, ever, and I really mean that, start a bar chart or column chart, at any value other than zero. If you've heard this, and if you agree with it completely and you know why it is, please skip ahead, but if you have any qualms, hear me out. The main reason is this, humans are really good at reading the sizes of rectangles, and the nature of a bar chart is that the height of the rectangle is telling you something.
It directly represents the value. But it's not just the height. It's actually the fact that as humans we will immediately interpret the area of the rectangle. We may not know we're doing that, but that's what we're actually thinking of. So the height times the width is the area. So it's about the height, but the fact of the matter is, that chart on the right-hand side, F looks like it has sold nothing, and even if I see the hundred, even if I'm sophisticated and I understand that the scale is cut off, that F has sold nothing, and I can't change the way I parse that visually, no matter how hard I try.
Simply making this a dot plot, rather than a bar chart, would sort of fix that problem, but it's really important to understand that you cannot start bar charts at anything other than zero. It's a nonnegotiable rule. So, I've had clients argue with me on this one, this particular example. So they say, well, you know, you can't see the difference. I need to show the difference between A and F, and so this is one way to do it, isn't it? So what's the answer? Well, maybe it's obvious to some of you. Just chart the difference, instead of the numbers themselves.
If that's the story, show that, and there are plenty of ways to do that. Okay, but you insist you have to show the actual numbers, not the change. This is what I was saying before, dot plot, right? You can just have little lines or little dots. I won't hold this against you if you do it this way, just don't use bars. By the way, this rule really is only about bar charts. Many other chart forms are okay to use with different y-axes, right, different y-scale. See this line chart. It's okay that the bottom starts at some number other than zero.
I'm looking at the trend of the line over time, not the area, right? It's not the area of a line that matters. So, a nonzero starting point is fine for this. Now, an area chart works more like a bar chart. The area, as the name implies, means that it's the area that matters. So you have to start a chart like this at zero. Starting an area chart with a nonzero chart has the same misrepresentation opportunity. How about the other issue with bar charts? What about pushing the top end of the scale up? This poses a different problem.
So right now we're emphasizing how little each division has sold. Now if we intend to do this, right? If our scale is intentional because we have a much higher target that they're supposed to be hitting, then that's fine. So this company, they should all be selling 1600, and they're not. Okay, then they are all failing. That's okay to use this scale. But if we're using this scale just because we're trying to manipulate our audience to feel that the numbers are too low, then that's a problem, and this gets to one of the biggest issues with setting the scale for a visualization. There's a lot of room for disingenuousness and downright dishonesty with scales.
Be fair, be accurate with your scales. Don't manipulate them, and don't break that cardinal rule. Another rule I follow pretty religiously, though I can be convinced of exceptions, is to use round numbers. A lot of software we use will kick out a chart and set the scales for us. So sometimes it'll pick the best numbers from an accuracy standpoint, but they're not human. They're not accessible. Your audience will always appreciate more accessible, round, memorable numbers. So this chart, instead of starting at 4472 and going up to 10472, it should be 4500 and 10,500.
I have these circles that represent something about the data. In this case, this is the population of countries. So, this big red one here is India, and this big blue one is China. This is the United States over here. And it's really important to remember. So the area, as I mentioned earlier, is what humans sort of parse when they're thinking about volume and size. So you don't want to double the diameter or double the radius of a circle. What you want to do is double the area of a circle. So if something is double the size of another, it's doubling the area, and the equation for a circle is r squared, right, pi r squared.
So this is a valid visualization technique and not something you should avoid. However, you need to be very careful in notating your charts clearly when using a logarithmic scale, especially if your audience is made up of unsophisticated chart readers. There's a huge opportunity to confuse your audience in this case, so please do your best to explain the logarithmic scale in your notes and explain why you're using it. Even a real pro in data analysis can make simple mistakes in setting scales. This lesson may be very obvious to some of you, but I hope it was a good reminder if it was.
And for the rest of you, take what I've said to heart. These simple rules will help you create visualizations that are more accurate, more honest, more accessible, and more effective.
- Why visual communications matter, and how they work
- Communicating via story
- Communicating with color
- Using legends and sources
- Sketching and wireframing
- Rethinking slides, charts, and diagrams
- Rethinking your templates and brand guidelines