Pie charts are one of the most commonly used data visualizations to explore the composition of data. They are easy to understand but they do have some shortcomings. Data visualization expert Matt Francis explains why care has to be taken when using pie charts and he offers some important guidelines to ensure you use them correctly.
- [Instructor] Of all the ways to visualize the composition of a data set, the pie chart is probably the most commonly used. But it's also one of the most abused visualization types. When used correctly, it can be effective, but far too often it gets misused and used the wrong way. So let's look at some examples of both good and bad pie charts and some of the things you should avoid if you are going to use them. So a pie chart works by showing the overall composition of a data set by encoding the values in the angles. Let's build for us an example and see how they work.
So say in our data set we're interested in looking at the overall sales subdivided up by our regions. So first we'll add the region and then we're going to add the sales onto the angle. So there we have a very simple pie chart. What we've done is taken all of the data; subdivided it up by the region and then encoded the amount of sales in the angle of the pie chart. Now this shows one of the problems by using a pie chart. Which segment is the biggest? Pie charts work on area and angle and it's very difficult to compare these accurately.
Pie charts are good for an overall impression of how the data is distributed. But in this example it's very hard to tell which is the largest, which is the smallest. However what we can say, is that there are four segments to this data. The west and east look pretty even and the south and central also look pretty even. That's about all we can really tell from this data. One thing that pie charts are quite good for, however, is when you're combining regions together. So what we could say is, is that the combination of the west and the south...
Make up roughly half of the data. With other visualizations it's hard to make that same kind of association. But the pie chart does let us join those segments together. Now pie charts only work well with a minimum number of segments. Three to four is probably about the maximum you want to deal with. Any more than that and the pie chart starts getting messy. Let's look at a bad example. This is looking at all of the sales for each one of our states. So the angle is the amount of sales and they're colored by state. The problem here is we've got 49 states being displayed.
But we can only show 20 distinct colors. So the colors have started repeating. So if we look on here we can see we have three dark blues. But which state does that refer to? Even though we've got the legend, we don't know. Also, which is the third biggest state. We can see that that peach color one is the biggest, followed by probably the red one. But then is it that blue? Or is it the green at the top? It's really hard to see any kind of ranking in this kind of visualization. At the moment, this is sorted alphabetically. And we could change that to be sorted by the amount of sales.
So in this case, the smallest to the largest. So now we do have some kind of order. However what we can't tell is how much bigger is the peach state, which is California, compared to, say, Florida. Because we're trying to compare angles against each other, and area, that's a really difficult thing to do. The better way to visualize this data would have been in a simple bar chart. And then we can see California and Florida. And we can compare the actual values between the two.
We can see much more detail in here than we could have done in the pie chart. Another example where pie charts get abused is when we're looking for change over time. This example here, we're looking at the regional sales for four years. Now the question is, are sales increasing or decreasing for each one of our measures? Well, it's really hard to tell. At first glance it looks like the west has been pretty static for three years, and then in 2015 it increased. Similarly, for the east, it started off just over a quarter or so and then increased and then it looks like it decreased again.
But it's very hard to see the ebb and flow of the data across time. A better way of visualizing time is using a line chart. If we convert these pies to lines, we see something like this. We can see that east has been steadily increasing, whereas the west in fact dropped and then increased. Let's compare these side by side to really see the difference. Again, looking at the west in the pie chart, those segments for 2013 and 2012 look almost identical. But the line chart shows a definite drop between those two values.
This is information and detail we wouldn't have got from the pie chart simply because we cannot accurately compare angle and area together. So a pie chart isn't a particularly good choice when we're looking at detail. But when we're getting a general feeling for data then it can work quite well. Another example of a pie chart is placement on a map. Here we're looking at the sales per state subdivided by all of our categories. Again, we can't see much detail, but we can see a general summary of the distribution. And there's some interesting things that crop up.
For example, in Wyoming, we only sell furniture. Whereas in North Dakota we only sell office supplies. In Montana, technology seems to be the biggest sector. Whereas in Kansas, it's office supplies. We're not comparing one state against the other so the pie chart works well because we're treating each state as its own independent entity and look at the composition of data within each one, according to what the pie chart is showing us. Pie charts are not a bad data visualization, but they are often misused and abused.
If you keep the number of segments to a minimum and don't compare one pie chart against the other they can be a very effective and simple data visualization that everybody understands how to read.
- Explain the importance of data visualization.
- Determine when an infographic or an exploratory dashboard would be most appropriate.
- Identify the elements of a Gantt chart.
- Recall the characteristics of four kinds of distribution.
- Recognize the drawbacks of using a pie chart.
- Explain when to use a symbol map.