Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
When you're doing an in-depth investigation of your data, there are times when you'll want to focus on just some of the cases, for example, all of the men over 50 who visited your website, or clients with outstanding payments, or people under 16 who have taken the SAT. Now, one way to deal with this is to sort the data and then delete all the cases that you don't want and save it as a new data file. This is an option, but it can get cumbersome, and you do run the risk of multiplying data files or losing track of what you've got. An easier way is to have SPSS select the cases of interest, and when this happens, the other cases are still in the data set, but are temporarily excluded from the procedures, and you can then switch to different selection criteria or you can return to the entire data set.
It's a more flexible and efficient way of working with interesting subgroups in your data. For this example I am going to be using the data set Searches.sav, which is information about Google searches on a state-by-state basis. The first several searches all have to do with statistical topics, for instance the SPSS Google search term or regression, and then I have some social media ones, and then I have some sports ones. One that's interesting at the right end of the data set--so I am going to scroll over--is an indication of whether a state has an outline for a high school statistics class, and maybe I would want to restrict my analyses temporarily to states that have this to see, for instance, if that's associated with their Google search patterns for statistical topics.
So the way that I am going to do this is I am going to select cases. I go up to the Data menu, and then I come down to the bottom to Select Cases. And the dialog box gives me several options. The first one is to simply include all the cases, which is what I have right now. The second one is If condition is satisfied, and the idea here is, say, if they have a score on this variable that is equal to this, or maybe another one, I can have more than one variable. And this is what I am going to use. I am going to say whether they have the statistics education. That's going to be statistics_ed = 1.
I will show that to you in just a second. I also have an option of using the random sample of cases. If I have a large data set, sometimes it's a good idea to try doing an analysis on a small part of it, let's say 20% or 30% or 40%, and then trying again with other parts of the data to see if the patterns I found hold there. You can also look for a time, or case range, for instance all the customers from 2009 or from 2007. And the last one, Use a filter variable, what happens is when I do a selection, SPSS automatically creates an indicator variable at the end of the data set.
So if I have one already, this simply gives me the option of using that existing filter variable. The second below that, Output, is grayed out because I haven't done a selection yet, so I can't use those options. So what I am going to do right now is I am going to go to select If condition is satisfied, and then I click on the If box to say what my criteria are for the selection. What I want to use here is the variable about whether a state has a high school curriculum for statistics. That's near the bottom of the variable list on the left.
I can simply double-click on that and it puts it up in the Selection box. Now, my selection in this case is very easy. This is a 0, 1 variable. It's called a dichotomous indicator variable. It only has two options. And I just want the 1s, so I am going to type statistics_ad, which is already there, and I am going to add =1. Once I've got that, I can go to the bottom and click Continue, and that shows up in my If condition is satisfied in the selection box. Now, the options at the bottom in Output show up. The first one is to simply filter out the unselected cases. It's the default.
It's what I am going to use here. But I do have two other options that allow me to change the data set. The second one, Copy selected cases to a new data set, does exactly that. It creates a second data set. I have to give a name for that data set. And then if I want to work with just that one, it can be easier. Or I can get rid of the cases that I didn't select. There may be situations in which I want to do that. You can call that destructive editing. I usually just filter out the unselected cases, but it's up to you. So now that I have got my criteria specified by what I am selecting and what I am going to do with the unselected cases, I simply press OK.
Now the output file shows me the syntax statements that it has used to create the selection. It doesn't show any charts here, because we don't have them. But if I go to the data file, you can see that on the left the row numbers of a lot of the cases are selected out, because not too many states have a high school statistics curriculum. Also, on the right side you can see there's a new variable there, Filter_$, that says Selected or Not Selected. That's a 0/1 variable. If I turn off the variable labels with the button on the menu bar, you can see that those are 0s and 1s underneath, but I will turn the labels back on now by clicking on the Value Labels button.
So anything I do is going to work only with the cases that I have selected, which in this case are states with a high school statistics education curriculum. I will make a box plot, for example, of their SPSS searches. I click on Graphs, to Chart Builder, and then in the gallery on the bottom I go to Boxplot, and I am simply going to drag the one-dimensional box plot up into the canvas. And from there, I drag in the variable from the list that I want. I am going to take SPSS and drag that into the X axis.
Also, because I may have outliers here, it's nice to have an ID to know what states they are. I can go down to the Group/Point ID tab, I can select Point ID label on the bottom, and then I need to drag in the variable that provides the labels. In this one it's the state code. So I come up to the variable list and drag the state code over, and now I am ready. I click OK. I first get a bunch of more code that's the syntax for what I have done. There is the GGraph command that gives the data set, and then here is the box plot.
This shows the distribution of Google search patterns in terms of how common that particular search is relative to others for several different locations, and you can see we have an outlier, it's Washington, D.C. up at the top, and they search for this term SPSS much more than other states do. So anyhow, what I have here is a selection criteria, the ability to temporarily or permanently select a subset of cases for a more thorough analysis, and this is a great feature of SPSS.
It lets you really dive into your data and get the most out of it. In the next movie we'll look at a related procedure called Split File that also lets you work with subsets, but instead of reporting on just one subgroup at a time, it gives the results for all of them so you can make comparisons between the subgroups.
Get unlimited access to all courses for just $25/month.Become a member
82 Video lessons · 73731 Viewers
80 Video lessons · 129348 Viewers
52 Video lessons · 63708 Viewers
59 Video lessons · 49463 Viewers
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.
Your file was successfully uploaded.