Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
Up to this point, we've covered methods for looking at one variable at a time as well as methods for looking at the associations between pairs of variables. In each case and consistent with good analytical practices, we started with charts because data is usually much easier to understand visually. Then we've done numerical descriptions of the variables and associations, and finally, we've done inferential statistics to generalize beyond the given data. In these last few sections, we'll take that pattern one more step by looking at methods for exploring the relationships of three or more variables, first with graphs and then with numbers.
A quick word about terminologies in order, when you look at one variable at a time it's called a univariate analysis. When you look at the associations between pairs of variables, it's called a bivariate analysis. Therefore it would make sense that when you're looking at multiple variables, it would be called a multivariate analysis. However, that term multivariate is typically reserved for situations where you specifically have more than one outcome variable. Those kinds of statistics are much, much more complicated than what we're going to be doing, which is using more than one predictor variable with a single outcome variable.
So I will generally avoid the term multivariate and instead just talk about multiple variables. With that in mind, let's look at our first chart for multiple variables. And just like when we did charts for one variable or pairs of variables, we'll begin with bar chart for categorical variables. Just this time, we'll have three categorical variables. To demonstrate this, I'll use the General Social Survey dataset in GSS.sav. What we need to do is begin by going up to Graphs in the menu bar and we come to Chart Builder.
Then we come down to Bar, except instead of Simple, we're going to use Clustered this time. So I drag the Clustered bar chart up to the canvas. What we're going to look at as an outcome variable in this particular example is a person's self-rated happiness. Sometimes the easiest way to look at your outcome variable is to make it so that the colors of the bars go there. So I'm going to take self-rated happiness and I'm going to drag it over to Cluster on X set color. Then we need a categorical variable on the X-axis.
I thought it would be interesting to see whether a person had attended a live drama in the last year. I'll put that on the X-axis. So that's two categorical variables for using attendants at a live drama to predict self-rated happiness, but that's just two variables. We need a third one and to do that, we have to come down to this tab that says Groups and Point ID. I click on that, then I come down to either adding a Rows panel variable or a Columns panel variable. And all that influences is whether the charts show up one above the other or one next to the other.
In order to keep it compact, I'm going to do a Rows panel variable. Then I need to add one more variable that creates pairs of charts. And I'm going to use gender. I'm just going to come right up here to this one that says Male and drag that over here. And so you see what I'll end up with is four groups of three bars. Now I just come down to OK and I can make the chart. There is a lot of code that goes into that, and we can save that for future reference.
And then what we have here is bar charts. On the left, we have whether people attended a live drama in the last year. More people have not. It's about 3:1. And then on the right are people who say they have attended one. The top two are for women. The bottom two are for men. The blue bars are not too happy, the green bars are pretty happy, and the beige bars are very happy. We do have one small problem with this chart and that is that a lot smaller number of people have seen a live drama in the last year.
That's because we're charting counts here. A really handy feature in SPSS is the ability to chart percentages as well. So I'm going to show you how to go back and do that. I'm going to come back up to our most recent command, to Graphs, to Chart Builder, and then here in the elements property, I have Statistics and it says Count. That's how many people are in each category. I'm going to click on that and instead I'm going to go to Percentage and that has a question mark because I have to set a parameter over here. I find the most helpful one as each X-axis category.
So what this'll do, it'll make things add up to 100% for those who have and for those who have not seen drama. So I select that. I click Continue. I have to come down here and press Apply and then I come over here and press OK. And what you'll see now is that the chart will look slightly different. The biggest difference is that the bars on the right side, for those who have seen live drama in the last year, are much larger than they were before because using percentages has equalized the two groups and it makes it much easier to see the pattern.
For instance, we see that those who attended the live drama last year, interestingly, for men, those who have seen the live drama, the percentage who are very happy is smaller than the percentage of those who were pretty happy. On the other hand, for women, the percentage of people who were very happy is slightly higher than the percentage of people who were pretty happy for those who have seen a drama in the last year. On the other hand, for those who have not seen a drama, the patterns are nearly identical for men and for women. Where most people are pretty happy, the next group is very happy, and the least common is not too happy.
A clustered bar chart could be a handy way to depict the relationships of these three categorical variables. However, you'll probably want to chart percentages instead of counts, but your choice of denominator can make a big difference on how the final chart looks. This gets back to a point that data analysis is probably best thought of as a form of storytelling and you want to choose displays that help you tell your story well or that help the data tell you something interesting and unexpected. It's worth noting that if your outcome variable is a dichotomous indicator variable, that's a 0/1, yes/no variable, then you can sometimes make things easier by charting the mean of the outcome which for 0/1 indicator variable will be the proportion of people who got 1s, for example, the proportion who are returning customers as opposed to first-time customers.
And this leads us to the next chart we'll cover, the clustered bar chart for means.
Get unlimited access to all courses for just $25/month.Become a member
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Your file was successfully uploaded.