Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
When you're looking at what SPSS calls a scale variable--that's something that can be measured as more or less, like the percentage of critics who gave a favorable rating to a movie or the budget or the box office earnings for that movie--you should generally make two kinds of charts. The first one, which we did in the last movie, is called a histogram. It's like a bell curve, and it's a good way of getting a feel for the overall shape of a distribution. The second kind that you should generally make for a scale variable is called a box plot, and it's primary purpose in this context is to check for outlying scores, because they can cause a lot of problems in later statistical analyses.
So you need to be able to identify whether you have outliers and often what those outliers are. So what I'm going to do now is I'm going to create a box plot for budget, which we used in the last movie on histograms. Come up to Graphs, to the Chart Builder, and from there I come down to the list, to Boxplot. There are several different versions of box plots. I am going to choose the simplest one possible. That's this one over here, which is called a 1-D Boxplot. It's for charting all of the cases on a single variable.
If I wanted to break down budgets by a genre of film, I could do that over here, under what's called a Simple Boxplot, but it's grouped, and I will show that in a later movie. But right now I'm simply going to drag the 1-D Boxplot up to the canvas, and then I'm going to bring in budget to the Y axis. This is the general format of a box plot. I will explain more when we look at the finished version. But I am going to do a couple of things. Number one is I may want to identify points.
If click on Point ID Label, and then I can actually get the movie name and I can drag that into here, so if I have unusually high or low points, it will actually tell me what the movie is. It makes life easier. I can also put titles on. I will have a title, and I will put Boxplot of Movie Budgets. Then I will press Apply, and for both of these I can now press OK over here. And what comes up is this particular chart. This is the text that is the syntax that produces the command.
This is the name of the command, this is the data set, Movies.sav, and this is the Boxplot of Movie Budgets. What you have here is budgets ranging from 0-- there's actually nothing with 0-- up to $250 million for the movie. This is from a few years ago. And this box right here shows the quartiles of a distribution, and this is the minimum value of any movie in the data set. This right here is the highest non- outlying value, and I say non-outlying because we have two outlier movies.
In this particularly data set Spiderman 2 and King Kong both had budgets of approximately $200 million. On the other hand, this box down here shows you the median, that 50% of the movies--there were 61 in this data set, so 30 of them--had budgets beneath this, which is around $25 or $30 million, and half of them were above. Now, I am going to show you a few ways to modify this chart that I think will make it a little easier to deal with.
As with every chart in SPSS, you modify it by first double-clicking on it to activate it. That brings up the chart in a Chart Editor window and it brings up a Properties window to the right. Now, one thing that I personally like to do is I like to turn these charts sideways by coming up to the button bar and clicking on the button that says "Transpose the chart coordinate system." The reason I do this is because the other charts that we make up, like histograms and like the scatter plots that we will show later, they have these variables listed across the bottom, with the lowest value on the left, highest value on the right, and I find it helpful to be consistent in this particular way.
I'd like to change the color of the chart. I click on the box, come over here to change the fill, and then the border I can change to another color if I want. I can change the way these bars work at the end. These are sometimes called whiskers. They go to the lowest and the highest non-outlying value. In case you're wondering, outliers are determined by being one and a half times of this middle range above or below the range. What we're going to do is I'm going to change the way these whiskers are.
This is just a preference issue. I click on that, and I come over her to Bar Options, and I am going to change it from a T-bar to what's called a Whisker. It's just a line at the end. And then here, if I want to, I can actually change the way that these look at the end. I have the movie labels there as well. Finally, if I want to change the Axis labels here on the bottom, like I did with the histogram where I changed these to millions of dollars, I click on the numbers, and I come over to the Properties window, to Number Format, and the Scaling Factor here, I'm going to put in millions.
I am going to press Apply, and this now gives me millions of dollars. And I need to change this-- it says Budget--to say Budget in Millions. I can close the chart, and now I have a good depiction that the overall distribution is on the low end, because this is movies that included award winners, that half of the movies have budgets of 30 million or less, but they go up to about 150 million, and that in this particular data set we had two other movies--Spiderman 2 and King Kong-- that had unusually large budgets, as is common among summer blockbusters.
Anyhow, when you're looking at a scale variable like budget, like viewer evaluations, like time spent on tasks, like time spent viewing a web site, then you do want to look at both the overall shape of the distribution with the histogram and you want to check for outliers, and a box plot is an ideal way to do that.
Get unlimited access to all courses for just $25/month.Become a member
82 Video lessons · 64729 Viewers
80 Video lessons · 124331 Viewers
52 Video lessons · 60260 Viewers
59 Video lessons · 46099 Viewers