Start learning with our library of video tutorials taught by experts. Get started
In this course, author Barton Poulson takes a practical, visual, and non-mathematical approach to the basics of statistical concepts and data analysis in SPSS, the statistical package for business, government, research, and academic organization. From importing spreadsheets to creating regression models to exporting presentation graphics, this course covers all the basics, with an emphasis on clarity, interpretation, communicability, and application.
SPSS has a number of really wonderful tools for helping you to get an in-depth understanding of your data. We've already looked at the Frequencies and Descriptive commands, which can give you nearly everything you need under normal circumstances. However, there are times when you need to look at things even more closely and this is where SPSS's Explore command comes in, with more ways to look at univariate statistics than you can shake a stick at, and let's look at some of those possibilities. To get to the Explore command you go up to the Analyze menu, to Descriptives, to Explore.
What you have here is a list of all the variables, both categorical and scale on the side, and a number of options here. What we are going to do is take the variables that we want and put them in the Dependent list. Now the term Dependent here means dependent variable, or an outcome variable, or the variables that you want a statistics on. In this case, I'll use the same ones that I used in the last ones. I'll use LastSale and I will use MarketCap.
Now Factor List is in case I want to break down the list. For instance, if I wanted to do LastSale and MarketCap by different sectors. I could do that, but there are 12 different sectors and at the moment I don't feel a need for it. I can also label the cases, and this can be handy because this will give me some charts that show outliers, and in fact I'm going to do that by coming up and getting a stock symbol and putting that down there. Then I want to go through some of the options over here.
I can choose what statistics Explore gives to me. I click on Statistics and by default it's going to give me the mean and a confidence interval for the mean. That's an indication of how spread out things are in mean and also given our sample what we think a true population value might be. We also have what are called M-estimators. That's a whole family of advanced what are called robust estimators that work well when things are skewed or they're outliers, but it's rather advanced.
We are not going to deal with that. I can also get information about outliers, which might label them individually. I could do that. I don't think we need to. I could also get percentiles, where for instance it gives me the values for the 5th, 10th, 25th, 50th, 75th, 90th, and 95th percentiles. You can do it manually in the Frequencies command, but it's nice to have it as a one-click option. However, I usually don't need that, so I am going to skip it right here. I'm just going to click Continue. So I am leaving the statistics at the default.
It has given me a ton. Next, I am going to look at Plots or the graphs. Now the first thing you can do is give me box plots, and we've done those separately in the univariate charts. And it's going to factor the levels together, which is fine, because I'm not splitting up the factors. It can also give me something called a stem-and-leaf plot, which is something that's normally drawn by hand, but I will show you that in a moment. I can get a histogram if I wanted. I've done those before, but I can get them additionally here. The next one is normality plots with tests.
This is a series of plots that are designed to see how well your data fit a symmetrical normal distribution-- that's a mathematical definition of a bell curve. Normality is the term for it, and that's important for a lot of statistics, but the normality plots can be a little tricky to read, and usually you can eyeball and see if your data seems to be behaving well, the way they would work well with a lot of other statistics. So I am going to skip both of those. I'll just click Continue, and let's take a quick look at options.
Now this is one where it asks what to do with missing values in case I'm looking at more than one variable in my Dependent List, which I am. The question is whether I want to exclude cases listwise or pairwise. And this is something that comes up in a number of other procedures, and it's worth pointing out. When you exclude cases listwise, what that means is you only include the case if it has information on every variable that you're including. So let's say I had ten variables in the Dependent list. If a case was missing information on one of those, it would not be included.
On the other hand, pairwise says include them whenever they have variables with some information. So it makes maximum use of the information, but you can end up with very different sample sizes, and there are procedures where it's very important to keep the sample sizes consistent going across. For Explore, that's a judgment call. You can do it either way. You can do it both if you want, one after the other. But I am just going to keep it listwise for now, the way it is. Click Continue and then down here it gives me the option to display just the statistics, just the plots, or both.
I will leave it at both, which is the default. I click OK and I get a lot of output. The first one tells me how many cases there are and whether they have valid data, how many are missing. There are 2,816 cases with missing data, and in each case I have four that are missing information on LastSale and MarketCap. That's just 1/10th of 1%. Then I have a table called Descriptives. I scroll down and I have the mean. The mean for LastSale is $18.7, and I've seen these statistics elsewhere, but this one gives me a confidence interval for the mean, which is an inferential statistic, and we will see more about those in the next section.
We also have something called a 5% Trimmed Mean. It shows us a way the highest and lowest few percentage points of the data and gives a slightly more stable estimate. We have the median and the indicators that spread with the variance and the standard deviation, and then we have several other statistics: the quartile and the skewness and kurtosis. So this is a lot of statistics that it gives all at once. You don't need all of them, but the nice thing is that they are available there. The second column, by the way, gives what are called standard error estimates for a few of the statistics, for the mean, the skewness, and the kurtosis.
These are sometimes used as inferential statistics, but we don't need to worry about them right now. Then it repeats the table for the second variable, market capitalization. Then we have what are called the stem-and-leaf plots. These are ones that are usually drawn by hand, and what is it is it takes the values and splits them up into two-digit numbers, where the first digit is what's called the stem, and it forms the line here on the side. The second number is the leaf, and the neat thing about this is this can be read as a histogram.
It's sort of a sideways histogram. But it also maintains the actual numerical values. So it's both a literal display of the data and a chart of a histogram, and then it marks some extreme cases separately at the bottom. Then here's a box plot. This is labeling the cases by their stock prices, and then we do a similar thing for market capitalization. So the biggest impression you might get might be that the Explore procedure is good for producing enormous amounts of output. It can be overwhelming, but if you really want to get the best picture or meaning the most comprehensive, not necessarily the most interpretable or useful picture, then the Explore command is the procedure of choice.
It can give you stem-and-leaf plots. It can give you confidence intervals and trimmed means. It can give you robust estimators. It can give you normality plots, among other things, if you ask for them, all of which recommend its use in particular circumstances. On the other hand, the slightly simpler procedures of Frequencies and Descriptives can still give you nearly all of what you need without deluging you with output. Nevertheless, if there's one thing SPSS is good at, it's providing you with options, and the Explore command is one with especially rich options and analytical value.
There are currently no FAQs about SPSS Statistics Essential Training.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.