# Creating side-by-side boxplots

In the last movie on graphing, we looked at how SPSS could create boxplots for a single scale variable broken down by the groups and a single categorical variable. Another variation on boxplots that can be handy is to show boxplots for several different variables side-by-side, and while this isn't technically a chart of the association between variables, it's a very useful chart that addresses multiple variables. These side-by-side boxplots work well as a shortcut method for checking outliers on several variables at once.

They are a great presentation graphic for showing the distribution of several variables and that way they could be considered a much more compact alternative to showing multiple histograms. The only real catch is that your variables need to be on the same scale, for instance they could all be opinion questions on a 1 to 5 strongly disagree to strongly agree scale, or they could all be dollar values in thousands of dollars. The other trick is that this feature was not included in SPSS's otherwise remarkable and comprehensive Chart Builder. Instead we will need to use what SPSS calls a legacy dialog and here is how it works.

For this example I am going to be using the Google Search's information because I have multiple interesting variables on the same scale. I am going to go to Graphs, down to Legacy Dialogs, and from there I go down near the bottom to Boxplots. Now I have a choice here of Simple which means without breaking things down by group or Clustered where I am breaking things down by groups. In this particular case I want to choose this option that says Summaries of separate variables, and I click Define.

All I need to do is pick the variables that I want to put in. Just to show what you are able to do, I am going to take all of the Google Search terms from SPSS down through FIFA and put them into Boxes Represent. Also, because when you are looking for outliers you often want to know who they are, I am going to take the State Code variable, right here, and put that in here to Label Cases by, and that's all I need to do. Now, I click OK and what we get is the syntax pasted at the top and then we have what's called a Case Processing Summary.

It's simply SPSS telling me how many cases it used, that we had valid data on all 51 cases, which is convenient. And then below that is the actual chart. Now this is a very busy chart and I am going to show you there is a couple of ways that we can clean this up and make it even easier to deal with. I am going to double-click on it and the first thing I am going to do is I am going to transpose the chart and turn it sideways by going to the upper-right and clicking on this button that says Transpose chart coordinate system. From there, I can change various elements of the chart.

I am going to change the colors by double-clicking on those and I will just change them to something else. Also, I am going to change the markers for the outliers and I will make them a little smaller and I will put them in the same fill and apply those. I will do the same thing for Utah over here, except that's nearly invisible now. I will use a darker one. There we go! Okay, then I'll make the text over here slightly larger and what I can see from here is that each of these variables was designed by Google to be centered around 0 because that's the national average.

What it's showing us is states that are above or below the national average. We see for instance that Washington D.C. is an outlier on several of them, for Totally Lost, for Data Visualization, and for Statistically Significant as well as Regression and SPSS. We can see that there is only one low outlier anywhere, and that's Arkansas on American Idol. Finally, the furthest outlier we have on anything is on Modern Dance and it's Utah, which is over 5 standard deviations above the national average which is pretty extraordinary.

Anyhow, you can see that a side-by-side boxplot gives a quick and a compact way to look at the distributions of several scale variables at once. You can check for outliers. You can also use them as presentation graphics. It's a handy alternative to multiple histograms and you should always consider the side-by-side boxplots when you have several scale variables that you want to analyze together.

