Join Bill Shander for an in-depth discussion in this video The right paradigm: Alternative charts, part of Learning Data Visualization.
Basic charts and graphs are familiar to everyone. And they're great at showing certain types of data. One limitation, though, is the number of variables. What if you have more variable, or more complex information than you can do in the most basic charts? This video goes through some of those alternatives, and when you might use them. One of the alternatives is a chart type called, a Box Plot. And, you know, it's interesting. You actually won't see these in the real world that often. Although, if you're in the financial world, you see them very frequently, in use in showing stock market data.
They're actually really good at showing multiple variables. So, let's say this is stock market data. The box plots are showing me the open and close price. Lets say that this is the open price, and this is the closed price of this stock. As well as the average stock price throughout the day. These things are called whiskers, these little dotted lines. The lines above and below. These might be showing me the top price for the day, and then the bottom price for the day. You can actually change what the whiskers represent. I could say that this is showing me the top 90% 90 percentile price for the day.
This is the bottom 10% price for the day. You can decide what you want it to represent. You can even include dots. So sometimes you'll see these little dots show up, to show outlier data even beyond. So let's say if this was showing me the bottom 10%. These dots are representing two things that are, sort of, outside of that scale. Box plots are really great for showing a lot of data, in a very simple form. And even if your audience isn't necessarily familiar with them, a good legend can make it easy for users to parse the information quickly. I also really like heatmaps for showing a lot of data, and a lot of variables at once.
It's a very effective form for seeing trends in the data. It's easy to see the correlations and the overlaps between different items. So, here we're looking at the time of day, and the day of the week, for web visits on a particular website. So, it's easy to see that in the middle of the day, during weekdays, this particular website gets a lot of traffic. Early in the morning, no matter what day of the week, not a ton of traffic. Heatmaps are a great way at looking at, really a limited list amount of data, across a number of variables. And to see how one compares to the other, or how groups compare to other groups.
Radar or spider charts fall into a category that I would label, Use With Caution. They're good at showing multiple variables at a time and showing their relative strengths between items. Especially if they're transitioning like this where I can see how things are changing over time. But kind of like pie charts, it's actually hard for the human eye and brain to parse the absolute values, of these things as they're changing. So, for instance, right here on this data point, it's below one, but how much below one? And how much above or below these other ones? It's a little hard to tell.
So the way this chart works is each one of these spines coming out of the center represents a single variable. So, what we're looking at here is data from the Eurozone crisis, and we're looking at this, pinkish line represents Greece. And the dark blue line represents the eurozone on average. And the entire size of the blue band, represents all of the data points in the euro zone. So for instance, if I look at 2012, I can see that Greece, by far, had the worst debt ratio, as a percentage of GDP. The highest interest rate, etcetera.
The lowest growth rate. So it is very clear from this chart why Greece was struggling through the crisis. But again, I have a hard time parsing the absolute values of these numbers. But as I turn on all of the countries, this shows one of the great flaws of radar or spider charts, sort of like pie charts. Where the is too much data, it's extremely overwhelming and not very helpful. Now if it is animating with all these numbers on here. It's interesting and if I track one line carefully, I might see it but not the most useful form, for something like this.
So if your goal is to compare whether a is better than b, and you have a bunch of variables, this form can work. But again, be cautious if you have a lot of variables and, or if you are trying to show absolute values. If knowing the absolute values of the numbers is important. Another chart form that I really like if we're looking at multiple variables is something called parallel coordinates. And so here we're looking at four different variables. We're looking at sepal length, and petal length, and sepal width, and petal width, which are botanical terms, these are referred to, the lengths and widths of parts of flowers.
And so what's interesting here is that each one of these lines that goes across, so I'm tracking this blue line, shows me the, data point. So, where it crosses the vertical line, that's the actual value. So this is an 8 here, and this is a 6.4ish here, etcetera. And while when there's a lot of data overall, the experience can be somewhat overwhelming. It is easy to see the trends. It's easy to see, for instance, that there's a decent amount of variety of sepal length amongst this red category, but then there's much less variety for the red category's pedal length.
It's also very good at showing relationships between variables. So for instance, I can see, that there is some sort of correlation or relationship between the second variable, and the fourth variable. Petal length and width amongst this red category, but less correlation when looking at sepal length versus sepal width for that category. There's a lot more spread here, than here. One drawback of this particular data form is that it can be overwhelming when there's a lot of data to look at. But when using an interactive version like this one I'm going to show you, there's actually functionality sometimes available called scrubbing.
So in this case I can actually click and drag and select just a portion of the data, to get a much more limited view. Uh,and focus in on what I really want to look at. So say I want to look at just the red lines or just some of these green and blue lines, and I can really narrow it down, and it's a little bit easier to see. If I want to narrow it down further, I want to say gee 3.2 over here that's really interesting. I can actually click and drag over here, and scrub just those two, so all of these and over these where is the crossover? Where, what are the data points that are within here and within here? And now it's really easy to see the patterns that I'm looking for.
So this example is sort of cheating. Really, I'm just throwing a bunch of standard charts, a bunch of scatter plots, onto a page. This is called a scatter plot matrix. So, I'm able to look at a bunch of variables, and a bunch of different views of those variables at the same. This is actually the same data as was in the previous example, the parallel coordinates example. But, what you're looking at here is a scatter plot of each of these four related just to each other. So for instance, in this corner, I have sepal length, and sepal length. I have a row of sepal length and a column of sepal length.
So you see it's a perfect correlation here. There's, one goes up, the other goes up because it's the same. But over here, what I'm looking at is sepal length versus sepal width. Sepal length versus sepal length. And so, just by looking at it in this way, I can see the different correlations and patterns, and how different they look. So for instance if I look at petal length versus petal width, there's a really interesting correlation pattern here, that I might not see if looking at this data in a different way. There really are many ways, you could even say infinite way, to look at data.
So between the basic charts and these alternative forms, you can cover a lot of ground discovering patterns and trends and making comparisons in your data. Even when you have what might seem like an overwhelming amount of data.
- Channeling your audience
- Understanding your data
- Determining the information hierarchy
- Sketching and wireframing your ideas
- Defining your narrative
- Using typography, color, contrast, and shape to convey meaning
- Making your visualization interactive