Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
When you are getting ready to analyze your data, you may have the situation where your data lives in more than one file. Now, SPSS lets you have more than one file opened, but in a number of procedures the data needs to be in the exact same file. Fortunately, SPSS has a command that lets you combine data, either by adding new cases that have the same variables or by adding more variables for the existing cases, and in this movie I am going to show you how to do both of these.
I am beginning with a data set that's called Search1.sav. This is simply the top-left quadrant of the data file that we used in the last two movies. I have information of a number of states about Google search patterns. What I am going to do though, is if you scroll down, you can see that I only have data through Montana. I have 27 cases here. I want to add the remaining states using the same variables, and what I have is another data file that has all the same variables in the same order but has the remaining states.
To do that, I come up to Data and I come down about halfway to Merge Files and this is where it asks me if I want to add cases--that's more observations with the same variables--or whether I want to add variables for the same cases. I am going to do both, but on this one I am going to add cases. Now, you can do this with either a data file that's currently opened--that's the top one, an open data set, but that's grayed out because I don't have another data set opened right now--or you can use an external SPSS data file.
I have that other data file. It's saved in the folder, and I am just going to open it up by clicking Browse. This one is just called Search2. I am going to double-click on that and then the full path shows up right here, and I am just going to click Continue, and so what it does now is it brings up a dialog box. It attempts to pair the variables by whether they come from the active data set or from the one that I am opening, but since I have the exact same variables in both of them, everything is paired up in the two of them. I can scroll down the list and you see that all the same variables occur.
If I wanted to, I can select Indicate case source as a variable. That's at the bottom of the list. What this would do is it would add a new variable to the data set, and it would indicate whether the cases came from the first data set or the cases came from the second data set, and it's a way I am keeping things straight. I don't need it in this case because there is no overlap and there will be no confusion between the two of them. I am just going to press OK, and I get the syntax and the results that say it is adding cases.
I go back to the data set. Previously, I only went through Montana, and now you can see that I have added Nebraska all the way down to Wyoming. Now, I have the same variables in the same order. Now I just have more cases. On the other hand, maybe I have the cases I want but I want to add more variables, more information about them. What I have right now is just Google's search history. I can scroll through, and all of these end with _GS to indicate these are Google Search patterns. But I have other information about each state that would be useful in analyzing these patterns.
So what I am going to do now is I am going to add new variables to the data set. I go back to where I was before, I go up to Data, come down again to Merge Files, except this time I select the second option, Add Variables. Again, I have the option of using an open data set, but the one I have isn't open, or an external data set. Mine are saved in an external data set, so I am going to click on Browse and I am going to use Search3. I will just double-click on that.
There it is and I click Continue. Now, it's bringing up the data set. There is one variable that is excluded and it's state. Now, that's the key variable that I used in both of them as a way of lining things up. You can see for instance that it has state and then it has a plus in parenthesis. That tells me that it's from the new data set that I am adding. So it would be redundant; we don't need it again. All I am going to do now is click OK and it tells me that it's adding a bunch of new variables.
I go back to the data set, and previously we stopped with the Google Searches, the _GS, but now you can see I have added several new variables-- I am going to scroll through them-- from has_NFL, whether a state has an NFL team, through Division. And so what I have done is in the first example I added new cases to the data set, I added new states, and in the second example I added new variables. And what this does is it takes three separate data files and combines them into one, which lets me do more analyses-- compare the relationships between the variables--than I would be able to do otherwise.
Now, the data may have been spread out across several sources, in typically many different locally stored spreadsheets in an organization, and by merging the cases or the variables, you're able to get in a much more productive situation of having all of your data in one place. When you have that then it's much easier to break things down to compare the groups and to examine trends and outcomes. All of these can give you a much more powerful insight into your data.
Get unlimited access to all courses for just $25/month.Become a member
82 Video lessons · 64752 Viewers
80 Video lessons · 124339 Viewers
52 Video lessons · 60277 Viewers
59 Video lessons · 46104 Viewers