Start learning with our library of video tutorials taught by experts. Get started
Viewers: in countries Watching now:
In this course, author Barton Poulson takes a practical, visual, and non-mathematical approach to the basics of statistical concepts and data analysis in SPSS, the statistical package for business, government, research, and academic organization. From importing spreadsheets to creating regression models to exporting presentation graphics, this course covers all the basics, with an emphasis on clarity, interpretation, communicability, and application.
At first glance, SPSS resembles a spreadsheet. There are rows and columns of data where each column represents a variable, such as a customer ID number, a question on a survey, or a city's population, and each row typically represents a case, which could be a person, a company, an advertising campaign, or whatever. However, there's a lot more to SPSS than that. First off, SPSS has more than one window. It has two, or possibly three windows. The window we are looking at right now is the Data Editor window, or Data window, and I have a sample data set called searches.save opened.
This is a data set that contains information about Google searches for specific terms, such as SPSS or regression, for each of the 50 states and Washington DC, and I will be using this data set frequently as a sample during this course. If you look at the tabs on the bottom left, this is what's called the Data view. The Data view is the one that looks like a spreadsheet. However, there is also one called a Variable view. If you click on that then what you see that it has information about the variables. The first column is the variable names.
Variables in SPSS have to have a single-word names. They can be up to 64 characters, they can have underscores or dots, and they can be upper- or lowercase. Otherwise they need to be relatively short, and again they do need to be a single word. The next column is the type of the variable. A string variable for instance is a text variable, and the state codes like CA or NY are entered as text. Everything else in here is entered as numbers and they're numeric variables, even though several of them have words laid over on top of them. I will show you in a moment.
The third one is the width of the variable and the fourth one is the number of decimal places. The next one is what's called the Label. This means although the variables may have short names, like state_code, the label can be something that's a little easier to read, like State_code with capitalization. Or if you go further down to row 18, you see there is one called degree. That's the name of the variable, but the label is much longer. It is percent of population with bachelors degree or higher. So label can be much more descriptive, and since the label is what's going to show up in a chart or in a table, you want to make that long enough that it's easy to tell what it is.
The next column is Values, and I said that most of these variables are entered as numbers. Now some of them just are numbers. The Google search information is numbers. They tell you how high a particular search term rates, relatively speaking, compared to all others for a particular state. On the other hand, other variables such as 15, 16, and 17 which has NFL, has NBA, and has MLS for Major League Soccer, those are Yes/No variables. Those are called indicator variables and I enter them as 0 for no and 1 is yes.
So the numbers are what's in the dataset, but you can see that I tell SPSS in values, if I come over it and click on that, that 0 equals No and 1 equals Yes, and you can add them and change them and remove them in this dialog box. The next column is whether you want to specify explicitly any particular value to indicate missing information. Say for instance a person forgets to answer a question. You may want to indicate that's an accidental omission. Perhaps you can give that a 999 to indicate that it's accidental. Or if you didn't ask a question because it wasn't relevant, you could give a different code like 888, or whatever you want. Just make sure it doesn't overlap with the valid information.
The next column is simply how wide the column is in the data set, and I make them 11 spaces by default. Let's scroll over a little bit here. Then there is alignment within the column: Left, Center, or Right. The last two are specific statistical things. This is what's called the Level of Measurement and in SPSS a variable can either be nominal, which means it's simply indicates a different group and a string variable where you write words as nominal, but also a 01 indicator variable is nominal, or the region of the United States which has 4-- 1, 2, 3, 4--regions, that can be nominal.
A variable to also be ordinal. You can indicate, for instance, the client with the largest account, then the second largest, and the third largest. The other choice in SPSS is what's called a Scale Variable, and you see there is a little ruler next to it. These are variables that are measured as more or less in set units so you can actually calculate statistics like an average for them, whereas you can't with a nominal variable. The very last column is called the Role, and this is a relatively new feature in SPSS. And you specify, for instance, whether a particular variable is to be used as an input variable, that is, you're using it to predict values on other things.
These are sometimes called independent variables or predictor variables. A variable can also be a target variable, and that is, it's always something that you're trying to explain, like for instance spending on particular products. Or a variable can be both, sometimes an input, sometimes a target and you see them marked as both. Finally, a variable can also be marked as none. That means it's not an input or a target variable; it's simply there for a state code as an identifier or indicator. And so those are the options in the Variable View window.
Let me go back to the Data view now. The next thing to note is you can actually have a lot of variables in SPSS. It's limited only by its ability to address the variables. It can address over two billion variables in two billion cases, which you are unlikely to hit in most situations. But this is the Data window. Now, what makes this different, also, aside from the metadata and the Variable window, is it when you run a command in SPSS, unlike a spreadsheet, it doesn't show up on the same page. For instance, I am going to quickly make a chart. I'm going to make what's called a histogram for "interest in SPSS" as a search term.
I go up to Graphs, and I click on something of a Chart Builder, which I will demonstrate more fully in a later movie. I am going to pick a histogram and drag it up into what's called the Canvas, take SPSS, and put it down here. Now what's interesting is I have a lot of options about how I set this up--and we'll save that-- but I want to show you two things. One is I can click OK and go straight from that dialog box, not to the Data window but to an Output window, and in the Output Window I set it up so that it gives me the written code that can produce this chart over again.
That's the information about the commands, and there's the chart. But you see this is a separate window. We had a Data window; now we have an Output window. I am going to back to the command for just a moment and show you an optional third window. Right next to the OK button there is something called Paste, and if I click that, it opens up a window called a Syntax window, and this is just command-line code. By pasting it, it has taken the written commands for this particular chart and it's put them in a Syntax window and I can use it to either modify the commands or I can use it to recreate the command at a later time.
It's a great way of sharing information with people. So watch, I can simply highlight all of this and I can come up and press the big green Run button, the Play button. If I hit that, you will see that it's done it all over again. It's a great way of replicating analyses. For instance, you can set up an analysis when you have only part of the data, or you can run it periodically as new data comes in. It's a wonderful feature. Now let me show you a couple of other features here in SPSS. One for instance, is under the File menu and the Edit and the View.
These are common things. The Data menu allows you to do a number of procedures to modify the data-- I'll show this in a few movies--and so does the transform to create new variables. You can insert headings and titles in your output. Analyze is the actual statistical procedures menu, we will go through that. Now Direct Marketing here is a separate add-in. SPSS has a lot of add-ins that you can purchase separately to give increased functionality to SPSS, but I won't be demonstrating those. The techniques that I am going to be using in this particular course all involve the base procedures that are available in SPSS.
The next command is to make graphs, and I have a whole series of movies about those. Utilities can be a way of getting more information about the variables or about creating scripts and production jobs, which are more advance procedures which we won't be covering in this course. Add-ons gets into some of the other services that you can purchase that connect with SPSS, such as SPSS Modeler which is for data mining and SPSS Text Analytics for analyzing open-ended natural language, like customer comments on a webpage or twitter feeds--it's a great way to go.
And then finally, the Help menu here gives you a huge amount of information. Let me open up, for example, the Tutorials, and this opens up in a web browser, although it's a locally stored file. And what you see here is an entire collection of presentations that SPSS will run through to teach you how to do any of a number of procedures, and they are very useful for learning how to use SPSS in even more depth. Back in SPSS, there is also what's called the Command Syntax Reference. This is a 2500-page searchable PDF file about the command-line syntax programming that you may be able to use it at a later point in more advanced functioning.
Now there are just a couple more things I want to show you in SPSS about how to set up the program. If I come back to Edit and go down to Options, there are number of things you can do to customize the way SPSS works for you. There's a few in particular I want to point out. One is in this tab called Viewer. Down at the bottom, on the left, there's a checkbox for Display commands in the log, and that's the thing that makes it so that SPSS inserts the written code that produces each analysis, or each display, as you go through. I find it a very helpful thing to do, in addition to pasting the syntax into a syntax window to be saved separately.
The other one that I think is important is under Output Labels, the second one from the right on the bottom. Output Labels lets you show things as either the labels that you give them--you may recall for instance we had the variable called Degree, which had a much longer label about percentage of population with a bachelors degree or higher. You could either have that long labels show up in the output and in the tables and in the figures or you could have the short name, which is just degree, or you can have both of them.
Similarly, with the Value Labels, like for instance, I had whether a state had NFL team, I had 0 as No, 1 as Yes, Labels means you can have the yes's and the no's shows, but you can also do it as 0s in 1s, and you can also do it as 0, No, 1, Yes. And I use one or the other depending on the situation. It can be a good way to keep track of things. It can also be a way of making things more presentation-ready to use just the labels. And yes, those are the options, and I encourage you to search through some of those little bit more to see what else is there.
So the organization of SPSS, says there is a superficial similarity to a spreadsheet, but you can see that it has been developed with an eye towards making statistical graphing and analysis much faster and more organized. Also, with the option to Paste command syntax into its own window and save it as part of the output with each procedure, that makes it much easier to keep track of what you do to share with others and to repeat analyses. Finally, SPSS's extensive help collection can make it easy for you to get directions and walkthroughs on nearly every procedure that SPSS does.
In the next video, we will talk about one other setup process, and that is getting data from an external spreadsheet into SPSS.
There are currently no FAQs about SPSS Statistics Essential Training.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.