
Easytofollow video tutorials help you learn software, creative, and business skills.Become a member
In the last movie, we looked at how to take a table of descriptive statistics in SPSS and then copy and paste it into a spreadsheet, and then in that spreadsheet to rearrange, delete, and modify the values in there to make them easier to communicate. In this movie, I want to show you how to take one particular kind of table, a correlation matrix, and work with that in a spreadsheet to clean it up and make it much easier to deal with, where you can go from potentially thousands of numbers to a small handful and present them in a way that makes them much, much easier to follow.
For this example, I'm going to be using the same dataset and the same variables I did in the last one, the Google searches information and searches.sav. And the first thing I need to do is get a correlation matrix, so I'll come up to Analyze, to Correlate, to Bivariate Correlations. Now I find it helpful to take the outcome variable and put that in first so it shows up in the left column. In this case, that's the relative interest in SPSS as a Google search term. The other terms that I used were Business Intelligence and Data Visualization.
I also used an indication of education with the percentage of the state's population with a bachelor's degree or higher. I used the Median Age and then I used three indicator variables for the region of the United States. Now even though there are four regions with indicator variables, you only need one less indicator than the number of categories. So for instance, when we have the two categories of gender, we only need a single indicator variable to indicate one or the other. With four categories, we only need three because the fourth category is implied by zeros on the three variables.
But I can highlight the three of those and move them over and now I just click OK. Now I have a correlation matrix here and as far as correlation matrices go it's not huge. I've had ones with hundreds of variables on each side. But you see that we have the variables listed down this side and the same variables across the top, and we have several statistics in the cells for each one. Please note that at this point the statistically significant correlations have Asterisks next to them. What I'm going to is I'm going to rightclick on this table and copy it.
Then I'm going to go to a spreadsheet. I'm using Excel in this particular case and I'm going to paste this not into cell A1 but into B1. And the reason I'm going to do that is I find it very helpful to have an index column at the beginning that allows me to restore the order of things. So I have 1, 2. I can select those and drag down and propagate the order list. Great! And now what I can do is I can start deleting and reformatting.
So for instance, you see in row one, the word Correlations is a single merged cell. That's going to make it difficult to sort things. So I'm going to simply delete that. Then you can see that in column B, the search terms are merged cells across three rows. This also causes problems. The way to deal with that is to simply delete the column. So I've lost the names of the variables but I can get those back because I have the same variables listed across the top.
However, I don't need the Pearson correlation and the probability level and the sample size. All I really want is the correlation, so I'm going to get rid of the other two. Simply click on a cell in that row and then I can sort the entire table. Now I have the Ns. They're all 51, so I don't need those in my table. Then I have the Pearson Correlations, then I have the Sig. (2Tailed). Those are the probability levels. I don't need those. I need to indicate them in a way and I'm going to delete them for right now.
So now all I have are the correlation coefficients themselves. I'm going to sort this again to try to get the titles on the top. I'm going to cut this and then insert it back beneath the titles. Then in order to get the variable list back on the side where it says Pearson Correlation, I highlight the list here, I copy that, I come back to this first one and rightclick, and I do Paste Special and Transpose.
And that switches it from horizontal to vertical. And so you see now I have the variables listed again. Now I'm going to do something else. I don't need all of these variables here. I'm mostly interested in just predicting SPSS, so I can highlight all of those and I can delete them. Also I don't need the SPSS correlated with itself. Now I can remove the borders. I can get this one flush left. I'm going to stretch this out a little bit, but this one is too long so I'm just going to call it Degree, and I'll make these other two a little shorter and center this one.
I don't need three decimal places. Two is plenty. But now I need to indicate which ones are statistically significant. I'm going to delete this column also. Unfortunately, we had asterisks in the SPSS table to indicate which correlations were statistically significant, but we lost them when we pasted into Excel. That's not a big problem though. Wwe could go back and manually check, but I know another way of doing this. I've provided a spreadsheet called CorrelationProbabilityFormulas and what you can do with this one is you simply enter the sample size.
in this particular case we have 51, and it will tell you what absolute value of correlation is statistically significant. In this case, it's 276. So anything that is greater than an absolute value of 276, so negative that goes past or positive that goes past it, is statistically significant. So I can go back to my table here and I can do a quick conditional formatting. Now it's a little silly when I only have seven numbers here. But the point is this works just as well as thousands of numbers.
I highlight the numbers, I come over to Conditional Formatting, I click on that, and I'm going to create a new rule. And I want to format only cells that contain values that are not between 0.276 and positive 0.276. So the values have to be more extreme than that. Then I go to Format and I can choose Fill and maybe I'll make them yellow and I press OK.
And when I do that, I see that the top three correlations are all statistically significant because they have absolute values greater than 0.276. Now it's also helpful to create a legend and highlight it in the same color, so it's clear that that color means something. If I want to, I can put a border around this. Many of you will have training in designing graphics and you'll find ways to make this even clearer. But what I've done here is I've taken let's look back at the original correlation matrix.
It's huge. There's hundreds of numbers here. And I've boiled it down to seven numbers and even then I've highlighted the ones that are statistically significant to make it easier to find. So this is one way to take the output of SPSS and to transform it into a way that makes it easier to communicate and easier to understand. In the next video, I'm going to show you how to integrate the results of a regression analysis to compare this and try to make the patterns clear across the two ways of analyzing the data.