- Welcome to the first video in this series. Today I'm gonna talk about data storytelling for regular folks. I'm gonna start by looking at a report provided by the government. I love looking to government reports for source material because they're publicly available, and there's no copyrights. So I'm free to examine and critique it well. Hey, I paid for this thing, right, with my tax dollars. So, today I'm gonna look at a report that came out in November 2017. This report is from the White House Council of Economic Advisors, and it's title is Evaluating the Anticipated Effects of Changes to the Mortgage Interest Deduction.
That's quite a mouthful. This report looked at the then proposed changes to the ability to deduct mortgage interest from taxes. Specifically, the mortgage interest deduction was set to be capped at mortgages of $500,000 or less. Prior to the change, that cap was $1 million. The final bill as passed ended up setting the cap at $750,000. Before we talk about the report, let me talk about what makes a data story work for regular folks. Regular folks are the general public, non experts, people who don't eat and breathe data all day long like I do.
The critical ingredients for a data story for an audience like that are labeling, narration, context-setting text, sequencing, and thoughtful simple visualization and design. So, I'm gonna look at this report with that lens in mind. Now, I'm not an economist, so I'm not qualified to make arguments one way or the other, so I won't, about the policy or even the data itself with any authority. What I am interested in is the data storytelling.
Feel free to download the report and take a look before I start talking about it. The executive summary at the beginning of the report does a pretty good job introducing the subject and arguing the two primary points. First, it presents the idea that the change to the mortgage interest deduction, the MID, can't be considered without also considering the proposed changes to the standard deduction, which is the amount everyone gets taken off the top of their income for tax purposes by default. Because when the standard deduction increases, fewer people will use itemized deductions, and the mortgage deduction can only be taken if you're using an itemized deductions.
So the report argues that since the standard deduction will be increasing, doubling in fact, the number of people taking itemized deductions will fall from 26% of tax filers to just 8%. So a lot fewer people will be affected by this change than one might assume. The summary also quickly mentions, but doesn't dig into the fact that the most affected people will be higher income tax payers. Next, the summary mentions that the MID is broadly justified as the tool to help increase home ownership.
However, the report argues, this is not backed up by research. And while the MID does not affect home ownership rates, it does affect the size and price people are willing to pay for their homes. It incentivizes us buying larger and more expensive homes. This leads to inflated home prices in some markets, which it argues, actually leads to decreasing home ownership rates since those higher prices put home ownership out of reach for some people. So, the summary argues, the MID changes could actually increase home ownership while decreasing home prices, particularly in high income, high priced housing markets.
The executive summary does a pretty good job summarizing the report's key points in order, and is mostly pretty accessible to the average consumer. That being said, some of the language is a bit odd. For instance, the report calls tax payers tax units, which may be normal speak in the universe economist hang out in, but it's certainly disconcerting to regular folks like me. I don't think of myself as a tax unit. It's a bit abstract and jargony. Assuming the executive summary is actually a preview of the data story, it should be covering the idea that itemization of expenses should come down, and the idea that home ownership rates should be helped not harmed by a reduced MID.
So, let's continue. Indeed, the full report starts by explaining the plan changes in a bit more detail, and then goes into the data behind the first point. That as the standard deduction increases, the number of tax payers who take itemized deductions decreases leading to fewer people taking advantage of the MID. They provide a table of data showing the number of people itemizing deductions under then current law compared to the estimates under the proposed law. Why they chose to provide a table of numbers instead of a chart is a bit confusing, as visuals of this data would probably be easier to understand.
It's easy enough to glance up and down the list and see the percentages are lower pretty much across the board, but what does it mean? If I visualize the numbers, as in this line chart, I can more easily see that the percentages of people itemizing deductions goes way down at all income levels. However, what's less clear is what this means for these people. Are they saving money? Is this ultimately good for them? Is there a difference in the financial impact for people in one bracket compared to those in another? Perhaps there's another way to visualize this data that would be more informative.
Maybe if I could see the total number of people in each bracket it would be more helpful than just the percentages. I'm not sure. But in the end, I'm not gonna criticize this too deeply because the two primary points of this report are not about the overall financial impact of the proposed change. So, let's move on. The report continues by talking more about those income brackets. And this chart is the centerpiece of the explanation. This chart is confusing for a few reasons. First, we have a left hand y-axis that isn't labeled.
I have no idea what those numbers represent. The chart title, the footnotes, the data itself, they all leave me guessing what those numbers really mean. They might be percentages, I think. In other words about 35% of loans are in $100,000 to $200,000 range, but it should be labeled that way so I'm not left guessing. Which brings up another labeling problem. We have dollar amounts on the x-axis and on the right hand y-axis. One is the loan amount, and the other is the median applicant income for those loans.
That's just confusing even though you can figure it out if you dig around the chart and really think about it. By the way, important side note, there's nothing wrong with making your audience think, okay? But we wanna reserve that for the important stuff not deciphering your poorly conceived chart. One simple thing that can help is to use the red line color, which is nicely repeated for the inline labeling here. Use that same color for the y-axis labels themselves. While we're at it, let's add the label to make it clear the x-axis is depicting loan amounts, and let's not make our audience tilt their heads to see those numbers, okay? You never wanna do that.
By the way, I'm not even gonna talk about the fact that this is a chart with two y-axes, which I generally frown upon and recommend against. Let's go back to what's actually being explained here. The argument is that most loans, something around 90% of them, are for less than $500,000, which remember, is where the new cap is gonna be. And the median income of people taking out those loans is under $150,000. So, if you really work at it, and assume the left hand axis is percentages, and you do some math, you might get to that conclusion.
But that isn't the argument that's made in the text. They make other points that are even harder to pull from this chart. And they even pull from different data. But let's continue to focus on this chart. I would have at least made this point come home more fully by, at a minimum, more thoughtfully labeling this chart to be sure the visualized data says something useful and obvious rather than drawing the eye to something that provides as much confusion as evidence to back up the story. So, let's make a couple of final changes.
Let's highlight the area that's the focus of the chart. The 90% of loans that are for people with incomes below $150,000. Let's add a little text to both sections explaining the exact point the chart is supposed to make. And while we're at it, let's get rid of the pesky grid lines that are on all of these charts. They're a visual distraction and serve no purpose in this instance. And I also changed the chart title to actually say the most important thing that I wanna say about the chart. Rather than just describing the data, which the original title did, which by the way, a lot of chart titles do, my new title is actually saying something the audience cares about.
You could easily make the case that by sequencing a series of charts, first showing the 90% of loans, then the 10%, then the median income, you could do an even more effective job of explaining the story piece by piece. Okay, so that's better, but it actually doesn't address my main pet peeve with this visual. I would argue it's not even representing the right data. The text surrounding this chart in the report touches on the point about the number of people affected and their income brackets, but it also emphasizes a study showing that lowering the MID would result in lower house prices; therefore, increasing home ownership opportunities for lowering income people.
But this data is no where to be seen. It's mentioned but not really referenced. I would expect a report like this to include a better explanation for this argument. Show me those income brackets, the prices of their homes, and then show me those price changes. How many people will benefit from being able to buy a new home, etc? Without that information, this is a data story that's using some but not all of the data to make it's points. The rest of the report continues to do a credible job arguing points, referring to research and data to back up it's points.
And provides a couple more visualizations to help the cause. While there are some data visualization rookie mistakes, such as on this chart on page eight of the report, which uses a line to connect data points on a categorical chart, is there a progression from Portugal to Switzerland? No, then don't use a line, okay? The data story is overall pretty solid. As I mentioned, there's a clear focus on a few key points. There's a flow of ideas from one to the next, a logical linear story.
There's data driving the entire story with visuals to help explain that data. And there's a conclusion that wraps the story up. That being said, a data story like this, especially for regular folk, would need a lot of language adjustments, much more clearly explained visuals, cleaning up some visualization mistakes, a better alignment of the visuals and data displayed with the arguments being made, and much more thoughtful labeling and captioning of the visuals to help clarify this story not introduce more questions.
This report provides a good object lesson in data storytelling for regular folks using a real world example. This is our government, which is suppose to be by and for the people, making decisions that will affect every one of us. It doesn't get more regular folk than that. Up next, we'll be talking to Neal Halloran who's a master data story teller who loves to cover complex and interesting topics in ways that are definitely meant to speak to regular folk like us.