Join Bill Shander for an in-depth discussion in this video Find the story in your data, part of Data Visualization: Storytelling.
- [Instructor] If you're a data analyst you're familiar and comfortable rooting through data looking for patterns and trends in outliers, but are you comfortable thinking about the stories you might be telling with that information? And for those of you who aren't data people, even digging through the data itself might seem a bit overwhelming. In this video I wanna show you just a bit about how to work with data to tease out the stories, whether you're a data analyst or just a regular person. To do this we're gonna be looking at some real data. There are a million and one great resources for data out there, first I would recommend that you work with data you find interesting, if you're doing this on the side especially, hopefully if you're working with data for your organization you find that interesting.
I like looking at a lot of social issues data and I'm starting this exercise thinking about one of the big issues of the day which is income distribution. It's a major component of every political conversation around the world these days. Of course the two sides of the argument have very different ideas about whether income equality needs to be a goal and if so, what the prescription is to solve it. But hey, we're data people, our task is simply to look at income distribution and try to get a sense of the data itself, no politics here at all. I'm gonna focus today's efforts on the United States, and the best place to look for this data is the US Census Bureau.
First I'm searching the web for US income statistics. So I'm more likely to land on the right page on the census website. And so first link here, I'm looking at income, people in households, US census, that looks good to me. If I click through here I can find historical data, right, over here or over here. if I click in historical data I can find household data very easily. So we'll go to the share of aggregate income table, so I'm gonna click into all races 'cause we're not looking at race, we're looking at all the data. And I download a spreadsheet and if I open that up and this spreadsheet will be in the exercise files folder.
I've removed the columns I don't care about, I just have the year and then each fifth of income owner, it starts at 1957 and then goes down to 2014. So I've generated a bunch of charts, I'm just gonna zoom out a little bit so we can see them side-by-side. First think I did is I looked at the actual values, so in other words these numbers here, columns B through F, and I said, okay, let's do an area chart of each of those. So for the bottom fifth I can see that their income share has shrunk, as I would expect, same thing for the second from the bottom fifth, the third from the bottom fifth and the fourth from the bottom fifth.
In other words, most of the country's share of income has shrunk except for the top 20%, the top fifth whose share of income has actually gone up. But the problem with this looking at this data in this particular way is what I'm looking at is their actual values. And as you'll notice there are a bunch of problems with this. First of all, if I look at say, the bottom fifth, it goes from zero up to about 4.3% or so at the highest. Whereas let's say on the top fifth it goes from zero all the way up to about 50 something.
So the scales are completely off, these are not apples to apples comparisons. I can see compared to themselves whether they've shrunk or grown, but it's really hard to compare them to each other. So the next thing I did, of course, was actually I changed these into percentage changes. So I generated these extra columns here, and you know, we don't have to go into Excel skills here too deeply, but I just ran a bunch of formulas, essentially I took, you know, whatever value I'm looking at, so in this one I'm looking at the, this is row 18, column H, I created a new value which essentially says, this is for 1983, I took the 1983 value, subtracted the 1967 value and then divided that by the 1967 value, so that gives me the percentage change all the way through until 1983.
And so the bottom fifth didn't change at all interestingly, whereas the next fifth they went down by 8.3% and then the top, if I scroll all the way down to row 49 I can see that the bottom quintile has gone down by 22.5%, the top quintile has gone up by 17.43%. After generating those actual percentage change values, if I scroll further to the right in this exercise file, once again I generated the actual charts. And so if I zoom out a tiny bit more what you can see is what that looks like.
And now I am comparing apples to apples. So I can see the percentage change. So the bottom qintile, their share of income actually rose in the late 60s and 70s, and then started to go down again in the 80s and through the 90s and 2000s. Same thing with the next fifth and then the next fifth and then the next fifth until you get to the top fifth whose income went down a little bit in the 60s and then started going up in the 70s and certainly into the 80s and 90s. As you look at the data this way, and now we're comparing apples to apples, they are all kinds of interesting things that start to jump out.
In addition to that overall change story I see something really interesting. If I really look closely at the data, at data point on the x-axis 26, and again, it's not labeled properly, but these are actually the years from 1967 all the way through 2014 if you remember. At 26 data point on all of these there's a dramatic drop or rise depending on which set you're looking at. So what's going on there? That's actually 1992. As I'm trying to figure out what story I'm gonna tell with this data I see something interesting going on in the data and I might say to myself, there's gotta be something that's there that I could use to tell a story, something maybe happened in 1992, so what might I do now with the data? Maybe now I'm gonna, instead of looking at the quintiles, right, the bottom and the top 20 percents, maybe I wanna look at 10 percents and maybe I wanna look at deciles.
And so, looking at, you know, every 10% bit, how much have they gone up or down? Or maybe I wanna look at data all around 1992, maybe from '89 through '93 or something and say, what happened in 1992. So I might change my headline, I might find a different story here that I really wanna start to investigate. I'm not gonna go through this entire exercise, that's not the point of this particular movie. The point is to say that you can start looking at your data, you're gonna start to find interesting things, and as those stories come out you might change directions, you might find other stories to dig into and that's what it's all about.
If you are a data analyst, then this of course is very simplistic, right, this is far too simplistic for you possibly, the point for you is to think about how you're gonna focus on story telling while your analyzing your data. What questions are you asking, what answers are you finding, you're gonna need to be able to assemble these things into some sort of a logical flow with a focused story line. So as you do discover interesting things in your data, don't be afraid to bookmark 'em, put 'em aside, you know, you can realize that even if they don't fit the flow of a story, the data might be interesting, but maybe you can use that in another story.
Maybe you have two stories to tell and that's really more than okay. If it's something, if it's a tangent that you can go on and tell a cohesive story around that tangent, go for it.
Join data visualization expert Bill Shander as he guides you through the process of turning "facts and figures" into "story" to engage and fulfill our human expectation for information. This course is intended for anyone who works with data and has to communicate it to others, whether a researcher, a data analyst, a consultant, a marketer, or a journalist. Bill shows you how to think about, and craft, stories from data by examining many compelling stories in detail.
- Creating a narrative structure for data
- Applying narrative to data
- Identifying what you want to say with the data
- Analyzing what your data is saying
- Determining what your audience needs to hear
- Leveraging tables, charts, and visuals
- Ensuring your narrative provides context and direction