- [Instructor] So I've talked about how sometimes exploring your data is best done just by starting to generate charts, really just starting to look at the data, and sometimes a visual representation of the numbers is the best way, often times, the best way for you to see what's interesting to see. For instance, so here we have our minimum wage data. There's no way I could have scanned across this row of numbers and seen what I see here, that it goes up and stays and goes up and stays and goes up and stays. Maybe I could have seen that because these numbers are literally the same, but I certainly couldn't have told you what the pattern is in gas, that is sort of goes up and down and up and down and up and down and then shoots up and shoots down, et cetera.
So visualization is a very powerful tool to explore your data, which is why I start generating charts. So first I generated charts of just the actual numbers themselves, then I looked at the ratio, how much gas could minimum wage buy me over each year, and the other things, charted those. But as I mentioned previously, when I try to chart them all at once I ran into a problem. And that's because the price of bread and eggs and gas are all sort of similar numbers, two, four, six, down here, but the unit of electricity is a much bigger number, so looking at them on a chart and making comparisons between them gets very difficult.
Which is where indexing can sort of come in and help save the day. So when I explored this data I decided to do an index because an index essentially will turn all of my values into a value between zero and one. And I'm not going to explain exactly how to do indexing, I'm going to do that in a convert your data movie, but the basic idea in indexing, if I double click you can see the formula, I take my number, how much gas I can buy on minimum wage in 1980, and then I divide that number by the maximum value for how much gas I could buy in the entire data set.
And that turns it into, as I said, a number between zero and one, so here, it turned it into .556, or sorry, .56 essentially, which tells me it's sort of in the middle of the pack, and the maximum value in this entire set is wherever I see a one, because I'm dividing that number by the maximum number and so it's the same thing, divided by itself, therefore it's a one. So you can always see where the maximum number is when you see a one when you do an index. Long story short, if I chart these indices, now I can see how much minimum wage could buy me in terms of gas on an index from zero to one, and the chart in this case looks the same, but it's just, it's the exact same values but turned into ones instead of turned into the actual dollar values.
Why did I do this? Wny does it matter? It's because when I want to chart them all at once, now they're all in the same scale. Now they're all at one for a maximum value, and zero for a theoretical minimum value. So I can easily see the red line here, which is my price of bread indexed, has just sort of gone steadily down, versus my purple line, which is the electricity index, came down, went up, came down, went up, and then down a little bit. So it's easier to see them, even though this chart is hard to read, it's sort of easier to see them all on the same scale.
Indexing is actually very similar conceptually to the spark lines that I showed you earlier. Because the spark lines essentially also turn all of these values into the same scale, in fact what it really does is removes scale. They're all in a scale only relative to themselves; whereas, now I can see them compared to each other. It's sort of different scales for each one, although they're all on a shared scale; whereas, a spark line literally sort of removes all scaling and I can just see the trend in the line itself.
So as I'm exploring the data I do it a bunch of different ways, I visualize different charts, I try different things like ratios and indices, and then the last thing I will always do is I will try different chart types. So I generated a radial chart of all of my values, which is a little bit harder to read. I generated a scatter plot of all of my different values where they're all really clustered together. If I were to sort of shrink the scale on this I might be able to see patterns in here, maybe not. Or this crazy donut chart, which really tells me just about nothing, but you know, you try it, you try the different charts you have available to you in your tools and see what you can see in your data.
I always recommend you don't just try the charts available to you in your tools. Once you sort of think you have an inkling and a hypothesis about your data set, go out, find different tools, try different charts, get inspiration about different charts that will help you see the things in your data to actually explore them further. And of course you might need to use more sophisticated data analysis tools than what I'm showing here. So, this is not a class in data analysis, this is data visualization class, and so I'm using a visualization as my primary mechanism for exploring my data set.
Released
12/20/2018- Describe the process by which individuals’ interests are incorporated into data visualizations.
- Differentiate the use of the Ws in data visualization.
- Explain techniques involved in defining your narrative when visualizing data.
- Identify the factors that make data visualizations relatable to an audience’s interests and needs.
- Review the appropriate use of charts in data visualizations.
- Define the process involved in applying interactivity to data visualizations.
Share this video
Embed this video
Video: Explore your data: Indexes and ratios