The last several sections of movies have dealt with methods for examining one variable at a time with graphs, descriptives, statistics, and inferential procedures. These kinds of univariate analyses can be very interesting in their own right, such as the number of people to vote for a particular political candidate or the amount of money spent on chewing gum in the US each year, which I've heard once is $500 million per year. And they form a truly essential part of any further analysis. That is they are foundational essential background pieces of an analysis.
So before you look at any combinations of variables you need to understand each variable on its own. But with that said, it's the associations between variables that are often of the most interest to people. For example, I am also told that people chew gum more often during times of social unrest. Now, you can make with that what you will, but it gets at the heart of the great majority of real world data analysis. How can you predict or explain one thing based on another? And as a first step to understanding associations, like we did with univariates, we're going to start where you should always start in an analysis: with a picture.
One of the easiest kinds of charts for showing associations is the clustered bar chart, which is particularly well suited for showing the relationship between two categorical variables. For instance, Normal or Ordinal variables. We covered simple bar charts earlier when we looked at univariate charts and they can be just as useful here. In fact, the only real difference is that we will now cluster variables by grouping them on the axis across the bottom. While the difference may seem small, it really opens up a lot of analytical possibilities in SPSS.
Now, to demonstrate this, I am going to be using the data set Searches.sav, about Google searches, and how they vary from state to state. In this particular example I am going to look at two variables that are near the end on the right. What I am going to look at, whether a state has an outline for a high school statistics class and I am going to compare that to the region of the country that they are in. There are four regions. So that's a categorical variable with four categories and statistics education is a dichotomous yes/no.
And I am going to look and see if the proportion of states with statistics curriculum varies from one region to another. Now, to do that, I am going to go up to Graphs, to the Chart Builder, and I am going to come down to Bar chart and choose clustered bar charts. I am going to drag that up to the canvas and then I need to take one variable and put it in the X-axis and the other variable to set the colors of the bars. What I am going to do is I am going to put the region in the X-axis, and for no other reason I have four regions and I don't want to have four different colors in my chart, but also you're going to see how this allows me to make a yes/no comparison more easily between each group.
What I am going to do is I am going to get the region variable, which is near the bottom of the dataset. That's this one right here, the Census Bureau Region. I am going to drag that down to X-axis and then for this one on the top-right that says Cluster on X: set color, I am going to take whether they have an outline for high school statistics. That's this variable right here. So I am going to drag that over to cluster, and I think that's all I really need right here. So I am going to come down and click OK. When we first get the output, we get a lot of text.
This is the command that you could write to produce this chart. Beneath that is the chart itself. It's just blue and green bars, and what it has is a pair of bars for each Census Bureau Region from the Northeast, and the Midwest, the South ,to the West, and the blue bar means that the state does not have an outline for high school statistics class, but a green bar means that it does. There are a couple of things that jump out immediately. First, is that in the Northeast not a single state has an outline for a high school statistics class.
The Midwest has just one, and the West has just three, but the Southern region, there are more states that have outlines or high school statistics, than there are without them. That's extraordinarily unusual. That's a very different pattern. Now there is one challenge with this particular chart and that is that there is not the same number of states in each region, and so it can make it a little difficult to compare from one to the other. Fortunately, the Bar Chart command lets us do something significant here.
What I am charting right now on the side is the counts. That's the number of states that do or do not have an outline for a high school statistics class. I am going to change that though to be a percentage and here's how we're going to work. I am going to go back to Graphs, to the Chart Builder, and I am going to pick up where I left off, except right here it says Count on the side, and if I go over to the Element Properties window where it says Bar, right here under statistics it says Count.
If I click on that, I actually have a huge number of options. I can specify tremendous number of things. What I am going to do is I am going to click Percentage. Now the reason that has a question mark in parenthesis is because I need to set the parameters for the percentage. It's asking me a percentage of what? I click on that. I don't want the grand total. What I do want is each X-axis category, that is, each region. I want to know what percentage of the schools in each region do or do not have a high school statistics curriculum.
So I am going to click on that one and press Continue, then I come down to the bottom of the Elements window and press Apply, then back over to the main window and press OK. We get the text output and then I scroll down and I have another chart. And you can see this one looks slightly different and it's because it's adjusting it for the differences in the sizes of the regions. We still see that in the Northeast none of the schools have an outline for = high school statistics class. That's why the blue line, the No, goes all the way up to 100%. In the Midwest, only 10% of the schools, in the South, over 50% have a curriculum, and in the West, it's just over 20%, and that's another way of adjusting for differences to make a little easier to interpret. You usually want to compensate for the differences in the sample sizes and look at the percentages or the rates in a particular area, and that's one of the beautiful things about SPSS, is how easy it makes that particular procedure.
So the first kind of association chart that we've covered, the clustered bar chart, is a small variation on a univariate bar chart, and it's a great way of showing the association between two categorical variables. This command makes a very clean, simple, and easy to interpret chart, which is the real goal of data visualization, is statistical graphics. In the next movie, we will look at using scatter plots to show the associations between two scale variables.
Get unlimited access to all courses for just $25/month.Become a member
82 Video lessons · 70080 Viewers
80 Video lessons · 127450 Viewers
52 Video lessons · 62390 Viewers
59 Video lessons · 48094 Viewers
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Your file was successfully uploaded.