[Instructor] Basic charts and graphs are familiar to everyone and they're great at showing certain types of data. One limitation though, is the number of variables. What if you have more variables or more complex information than you can do in the most basic charts? This video goes through some of those alternatives and when you might use them. One of the alternatives is a chart type called a box plot. And, you know, it's interesting, you actually won't see these in the real world that often, although if you're in the financial world you see them very frequently in use in showing stock market data.
They're actually really good at showing multiple variables. So, let's say that this is stock market data. These box plots are showing me the open and close price. Let's say that this is the open price and this is the close price of this stock. As well as the average stock price throughout the day. These things are called whiskers, these little dotted lines, the lines above and below. These might be showing me the top price for the day and then the bottom price for the day. You can actually change what the whiskers represent. I could say that this is showing me the top 90 percent, 90 percentile for the day and this the bottom 10 percent price for the day.
You can decide what you want it to represent. You can even include dots, so sometimes you'll see these little dots show up. To show outlier data even beyond. So let's saying if this was showing me the bottom ten percent. These dots will represent two things that are sort of outside of that scale. Box plots are really great for showing a lot of data in a very simple form. And even if you're audience isn't necessarily familiar with them. A good legend can make it easy for users to parse the information quickly. I also really like heat maps for showing a lot of data and a lot of variables at once.
It's a very effective form for seeing trends in the data. It's easy to see the correlations and the overlaps between different items. So here we're looking at the time of day and the day of the week for web visits on a particular website. So it's easy to see that in the middle of the day, during week days, this particular website gets a lot of traffic. Early in the morning, no matter what day of the week, not a ton of traffic. Heat maps are a great way at looking at really a limitless amount of data, across a number of variables and to see how one compares to the other or how groups compare to other groups.
Radar or spider charts fall into a category that I would label, use with caution. They're good at showing multiple variables at a time and showing their relative strengths between items. Especially if they're transitioning like this where I can see how things are changing over time. They're kind of like pie charts, it's actually hard for the human eye and brain to parse the absolute values of these things as they're changing. So for instance, right here in this data point, it's below one but how much below one and how much above or below these other ones? It's a little hard to tell.
So the way this charts works is each one of these spines coming out of the center represents a single variable. So what we're looking at here is data from the Eurozone crisis. We're looking at, this pinkish line represents Greece and the dark blue line represents the Eurozone on average and the entire size of the blue band represents all of the data points in the Eurozone. So for instance if I look at 2012 I can see that Greece by far had the worse debt ratio as a percentage of GDP, the highest interest rate, et cetera the lowest growth rate, so it's very clear from this chart why Greece was struggling through the crisis.
But again I have a hard time parsing the absolute values of these numbers. But as I turn on all of the countries, this shows one of the great flaws of radar or spider charts, sort of like pie charts, when there's too much data it's extremely overwhelming and not very helpful. Now if it's animating when all of these numbers on here it's interesting and if I track one line carefully I might see it. But not the most useful form for something like this. So if you're goal is to compare whether A is better than B and you have a bunch of variables this form can work.
But again, be cautious if you have a lot of variables and, or, if you're trying to show absolute values, if knowing the absolute values of the numbers is important. Another chart form that I really like for looking at multiple variables is something called parallel coordinates. And so here we're looking at four different variables. We're looking at sepal length and petal length and sepal width and petal width, which are botanical terms, these are referring to the lengths and widths of parts of flowers. And so what's interesting here is that each one of these lines that goes across, so I'm tracking this blue line, shows me the data point.
So where it crosses the vertical line that's the actual value. So this is an eight here and this is a six point four-ish here, et cetera. And while, when there's a lot of data, over all the experience can be somewhat overwhelming, it is easy to see the trends, it's easy to see for instance that there is decent amount of variety of sepal length amongst this red category. But then there's much less variety for the red category's petal length. It's also really good at showing relationships between variables.
So for instance I can see that there is some sort of correlation relationship between the second variable and the fourth variable, petal length and width, amongst this red category. But less correlation when looking at sepal length versus sepal width for that category, there's a lot more spread here than here. One draw back of this particular data form is that it can be overwhelming when there's a lot of data to look at. But when using an interactive version, like this one I'm going to show you, there's actually functionality sometimes available called scrubbing. So in this case I can actually click and drag and select just a portion of the data to get a much more limited view and focus in on what I really want to look at.
So say I wanted to look at just the red lines or just some of these green and blue lines, I can really narrow it down and it's a little bit easier to see. If I want to narrow it down further, I want to say, gee, three point two over here that's really interesting. I can actually click and drag over here and scrub just those two, so all of these and over these, where is the cross over? What are the data points that are within here and within here? And now it's really easy to see the patterns that I'm looking for. So this example is sort of cheating. Really I'm just throwing a bunch of standard charts, a bunch of scatter plots onto a page.
This is called a scatter plot matrix. So I'm able to look at a bunch of variables and bunch of different views of those variables at the same time. This is actually the same data as was in the previous example, the parallel coordinates example. But what you're looking at here is a scatter plot of each of these four related just to each other. So for instance, in this corner, I have sepal length and sepal length, I have a row of sepal length and a column of sepal length. So you'll see it's a perfect correlation here that as one goes up the other goes up, 'cause it's the same. But over here what I'm looking at is sepal length versus sepal width.
Sepal length versus sepal length. And so just by looking at it in this way I can see the different correlations and patterns and how different they look. So for instance if I look at petal length versus petal width there's a really interesting correlation pattern here that I might not see if looking at this data in a different way. There really are many ways, you can even say infinite ways, to look at data. So between the basic charts and these alternative forms you can cover a lot of ground discovering patterns and trends and making comparisons in your data even when you have what might seem like an overwhelming amount of data.
Released
12/20/2018- Describe the process by which individuals’ interests are incorporated into data visualizations.
- Differentiate the use of the Ws in data visualization.
- Explain techniques involved in defining your narrative when visualizing data.
- Identify the factors that make data visualizations relatable to an audience’s interests and needs.
- Review the appropriate use of charts in data visualizations.
- Define the process involved in applying interactivity to data visualizations.
Share this video
Embed this video
Video: The right paradigm: Alternative charts