Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
In the last movie, we use correlations to look at the strength of association between two variables. However, correlations are standardized measures. That is, they don't involve a unit of measurement. It's not a correlation of 0.78 meters or anything. It's just a correlation of 0.78. And what that can be really handy, because it makes it easier to compare associations across different kinds of variables, it can also be really nice to put the association back into the original metric. To do that we'll look at another procedure that's very closely related to correlation and that has many of its advantages, but that also uses the original units of measurement.
That is bivariate linear regression. As a note SPSS has a wonderful new procedure called Automatic Linear Modeling that also performs linear regression which we'll cover a little bit later. For now though, it makes more sense to stick to the standard linear regression, because we're only using one predictor variable and automatic linear modeling seems to a little like overkill for that. And second, automatic linear modeling does an awful lot of work behind the curtains and it's kind of nice to keep things visible for right now. As that in mind here's how to do a bivariate linear regression in SPSS.
For this example, we'll be using the Google Search data again, Searches.sav, where we will be using the percentage of people in a state with bachelo'rs degrees or higher as a way of predicting the relative level of interest in Facebook as a Google Search topic. To do this we go first to Analyze and then we come down to Regression and we go to the second one down, Linear. We need to take our outcome variable, that is the thing we're trying to predict, and put it in the Dependent box.
This means dependent variable or the variable that depends on other variables. In this case, that's going to be Facebook, that is Facebook as a relative interest in Google searches. Independent is the variables that we're going to use as predictors, in this particular case I'm going to be using the Percent of Population with a bachelor's degree or higher. Now the linear regression command is actually tremendously sophisticated and gives tons of options. None of which I'm going to use at this particular moment. I'm doing the simplest possible version here of simply using the Percent of Population with bachelors degree or higher to predict Facebook interest on Google Searches.
And I'm going to do nothing else at this moment. All I'm going to do now is press OK. And I get a table that tells me the percent of population with a bachelor's degree or higher and that is using Facebook interests as a dependent variable. The next table down gives me an indication of the association. We have a correlation here of 0.644. That's the R. Now to capital R here, because that actually stands for multiple correlation which means you can use several variables to correlate with a single outcome.
Although in this case we only have two variables so it's still bivariate. And then you have another one here that's called R Square and that is that the 0.415 is the square of the number next to it, the 0.644. And the reason you do this is because you can't really compare correlation coefficients. They are not linear. A correlation of 0.4 is not twice as strong as a correlation of 0.2, even though the number is twice as big. Instead, if you square them then you get numbers that are directly comparable and a correlation of 0.4 squared becomes 0.16 and a correlation of 0.2 squared becomes 0.04.
And so the other correlation is actually four times as strong. You also have something called Adjusted R Squared. Sometimes people report R Squared, sometimes they report Adjusted R Squared. An Adjusted R Squared changes the number according to the ratio of observations to predictors. We also have the Standard Error of the Estimate that goes into the probability values. And the next table is the ANOVA or ANOVA table. That's short for analysis of variance and it's an indication of the statistical significance of the model as a whole.
If we had more than one predictor then this would be an important thing, but because we have only one predictor and we know it's statistically significant it doesn't really tell us anything extra right now. The next one down from that is coefficients, and what we see here is the slope and the intercept that we are familiar with from charting relationships. The Unstandardized Coefficients are the slope in the intercept in original units. And so what we see is if we're trying to predict the level of interest in Facebook on a state-by-state basis we have an intercept here of 3.240.
That says give everybody an interest of three standard deviations above the mean, but then for every percentage of the population that has a bachelors degree or higher, subtract a tenth of a point from that. That's the -0.119. And that means it's a downhill. The higher the level of education, the lower the interest in Facebook as a Google search term. This will become clearer if I quickly make a scatterplot of the association between the two variables. I've already shown how to make scatterplot, so I'm going to go through this a little bit quickly.
I come to Graphs to Chart Builder to Scatter, where I'm going to put level of education here in the X, and I'm going to put Facebook here in the Y and I'll just click OK. And it's clear. It's a very strong negative association. The higher the percentage of the population with a bachelors degree, the lower the relative interest in Facebook as a search term. So the similarities between bivariate correlation and bivariate regression, which we just did, are pretty easy to see in this example.
They both give the same standardized effects and the same P values. The difference is that the regression model also gives the intercept and slope for the model which is a nice piece of information. Also in a later section we'll see how this procedure can be very easily adapted to having several predictor variables, in which case it's called Multiple Regression. And while it's possible to use categorical predictors in linear regression, the basic approach doesn't work well when the outcome variable is categorical. Instead, it's more common to use cross tabulations, which we'll turn to next.
Get unlimited access to all courses for just $25/month.Become a member
82 Video lessons · 74508 Viewers
80 Video lessons · 129725 Viewers
52 Video lessons · 63963 Viewers
59 Video lessons · 49738 Viewers
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.
Your file was successfully uploaded.