From: SPSS Statistics Essential Training (2011)

It's usually a good idea to enter your data in its least processed and most disaggregated form, that is, put the raw data in and any processing you need to do, do in SPSS. That way you can combine things if you want. On the other hand, if you bring the data into SPSS in an aggregated or combined or summary form, then you can't break it down later. Now one way of dealing with data that you want to aggregate, as long as you are dealing with nominal or categorical variables, is with the Multiple Response function.

It's usually a good idea to enter your data in its least processed and most disaggregated form, that is, put the raw data in and any processing you need to do, do in SPSS. That way you can combine things if you want. On the other hand, if you bring the data into SPSS in an aggregated or combined or summary form, then you can't break it down later. Now one way of dealing with data that you want to aggregate, as long as you are dealing with nominal or categorical variables, is with the Multiple Response function.

It's one of the neat tricks in the SPSS. This function combines the responses from several variables and allows you to create frequency tables and cross tabulations as though they were a single variable. In many circumstances, this can make life much easier. The first thing to say here is that you can organize the data in a couple of different ways, and Multiple Response can deal with either one of them. In this data set, Tickets.sav, I have hypothetical data about the purchase of season tickets to seven different kinds of events.

I have Baseball and Basketball and Football as well as the Symphony, the Opera, the Theatre, and the Ballet. And the idea here is we might want to look at what kinds of season tickets people have, how many they have, and whether there is, for instance, a difference in the gender and the age and the overall preferences of the buyer. And again, this is hypothetical data. I have it set up first where I have each possible event, the three sports and the four cultural events, as indicator variables.

So you see here for Baseball we have Yeses and Nos for whether a person has season tickets to Baseball, and then to Basketball and Football. Then I have a column that adds up how many sports events they have season tickets to. The first person has season tickets to two sporting events, Baseball and Football. The second person has none. And then I have four cultural events. I am going to scroll over a little bit, so you can see all of it, and I do a similar thing. I add up how many cultural tickets people have. Then I also have another one, combining both the sports and the cultural, how many season tickets they have all together. I am being a little optimistic, but this is how that works.

So this is a series of what are called dichotomous indicator variables. Dichotomous means just two possible values, yes, no; male and female; and an indicator variable is a 0/1 variable, where 0 is no and 1 is yes. In fact, if I go up to the menu bar and click on this button for Value Labels, you'll see the 0s and the 1s that are underneath these. I put the Value Labels back on, you can see the Yeses and the Nos. So the indicator variables is one way I list every possible choice and I put down a Yes or No for each person.

The other way of organizing multiple response data is by simply having a variable for the maximum number of choices that a person can have. Now in this hypothetical data set nobody had more than four sets of season tickets, and so what I have is Tix1, 2, 3 and 4, by whether they have season tickets. There are seven options for each one of these, and I simply put down the first one, the second one, and if that's all they have, I put 0s for the rest. You can see actually I have some people who have no season tickets at all, down about case 16.

This is a way that people often do coding, especially if it's open ended, write down all of your feelings or your responses to a particular question, but I'll let you know right now, this kind right here, the Tix1 through 4 where we can have any of the categories in any of the columns, this can get extremely cumbersome. In my experience the indicator variables, even though we have to have more of them, is more amenable to adding things up and to doing other analyses. Now with that in mind let me show you how to set up a Multiple Response format.

The first thing you have to do is define what are called variable sets, the variables that should be treated as instances of a single category. You go up to Analyze and then you go down near the bottom to Multiple Response and define Variable Set. You'll see I have two other options beneath that, Frequencies and Crosstabs. They are not available yet, because I haven't defined any sets. I click on that, and I am going to do this twice. I am going to do once with the indicator variables--that's the 0, 1, yes, no variables--and another one with the multiple choice ones, the four columns for the four kinds of tickets people have.

So what I do is I first scroll down here and I'll pick the three sporting events and put those over here, and then I'll click the four cultural events, and I'll put those over. And then what it does is it asks me whether these are dichotomies--that's the 0, 1 for instance--or whether they are categories, where it's the 1 through 7. This part is the dichotomies. And it says, which one counts as a yes, because it might be 0, 1, but it might be 1, 2, or something else.

I just have to indicate that it's the 1 that counts as a yes. And then I have to give a name to the Multiple Response set, and what I am going to call it here is TixDichotomies, Dichotomous Variables for ticket purchases. And then I click on Add over on the right. And so what this does is it creates a Multiple Response Set. It's $TixDichotomies. This won't show up in the data set because this is more like a metadata.

It's information about the data set that the computer saves. So I have done this, and I can press Close now. You see the data set does not look different, but if I now come up to Analyze and back down to Multiple Response, I now have these two other options of Frequencies and Crosstabs available. What I can do for instance is I can click on Frequencies, and there is the Multiple Response Set that I just created. All I do is I move it over and I press OK.

And I get a table that says, how many people had purchased each kind of ticket? Now this is the same thing as the 0, 1 indicator. It's simply telling me how many people had basketball tickets, how many people had opera tickets. So this is one way of doing it. I can also do cross-tabulations. If I go back to Analyze, to Multiple Response, to Crosstabs, I can say that I want to look, for instance, at whether there are gender differences in these. And I can put the Multiple Response Variable in the Column(s) and gender up here.

However, I have to define the gender variable. I'll define the range and I simply tell it that I have 0s and 1s. Press Continue. Then I can click OK, and this is called a cross-tabulation. It lets me know the number of men and women who have season tickets of each kind. We'll go back to crosstabs in a later movie, but I just wanted you to see that there is an option with the Multiple Response Set. Now, I can also do multiple responses with the other kind where I have it open ended where people can put anything for the first set of tickets they have to second set. Let's look back at the data set.

That's these four at the end. I only need four, because four is the most that anybody purchased. To do this one I come back to Analyze, back down to Multiple Response, and I am going to define a new variable set. This time I scroll down and I select these last four, First Season Ticket through Fourth Season Ticket, and then move those over to Variables in Set. In this case, they are not dichotomies; they are categories. And I need to tell it the range. There were seven possible choices, so I need to say it goes from 1 to 7. Then I need to give it a name.

Now the last one was TixDichotomies. I might as well call this one TixCategories. Ticket Categories, this would be my label, and then I click Add. So that shows up as another response set. I click Close and I can do the frequencies and the crosstabs again using it this way. So I come back up to Analyze, to Multiple Response, to Frequencies. Now I used the dichotomies the last time. I'll just double-click and get that out of there.

I'll use the Categories this time and hit OK, and you see I get the same kind of information. It's just the data was organized differently. I can also do the crosstabs the same way. Going up to Analyze, to Multiple Response, to Crosstabs, so this time I take out the Dichotomies and I put in the Categories. Now I get the same output either way, which will make it seem that these two methods of creating multiple response sets or equivalent; however, I'll let you know there is a trade-off.

The Multiple Response set that's created on the categories, that is, with these multiple choice ones, where people could put any of the answers, about the only way to use these variables is with Multiple Response sets, and they are very limited in their application. On the other hand, if you do the indicator variables, which I had over to the left, these are much more flexible, and they can be used in other procedures like getting correlations and regression that we'll do later, which is why I almost always use the indicator variables, the 0, 1 variables for each choice.

The only trouble is if you had, for instance, a lot of possible responses. You could end up with a huge number of indicator variables where you could only have a smaller number of these category columns. On the other hand, if you really have that many choices, you might be wise to your collapse categories and combine them. Anyhow, the Multiple Response function in SPSS can be a nice way of dealing with situations where people can choose or write in more than one answer to a question. The procedure is flexible because it can used dichotomous indicator variables, that's the 0, 1, for each possible choice, or a smaller number of categorical variables with several choices for each.

However, the procedure does limit you to doing just frequencies or crosstabs for other nominal, ordinal variables. For these reasons I generally recommend that you use the dichotomous indicator variables. But for now the Multiple Response function is an important tool in your collection for data-analysis strategies.

This video is part of

Image for SPSS Statistics Essential Training (2011)
SPSS Statistics Essential Training (2011)

52 video lessons · 18810 viewers

Barton Poulson

