# Calculating multiple regression

## Video: Calculating multiple regression

In the last movie we covered SPSS's new Automatic Linear Modeling function, which takes a lot of the stress out of statistical analysis. It can also let you control almost everything manually should you so desire. On the other hand, you maybe using an older version of SPSS that doesn't have Automatic Linear Modeling, because that's something that's new with version 19, or you may want to include some options in your analysis that it doesn't have, such as something like Hierarchical Blocking, which I use frequently. In that case, you'll want to turn to SPSS's Standard Linear Regression function, which is what we'll discuss in this movie.

## Calculating multiple regression

In the last movie we covered SPSS's new Automatic Linear Modeling function, which takes a lot of the stress out of statistical analysis. It can also let you control almost everything manually should you so desire. On the other hand, you maybe using an older version of SPSS that doesn't have Automatic Linear Modeling, because that's something that's new with version 19, or you may want to include some options in your analysis that it doesn't have, such as something like Hierarchical Blocking, which I use frequently. In that case, you'll want to turn to SPSS's Standard Linear Regression function, which is what we'll discuss in this movie.

The goal of regression is pretty simple. Take a collection of predictor variables, multiply all of them by certain weights called regression coefficients, which are related to the impact that each variable has on the outcome. Add them all up and predict scores on a single scaled outcome variable. The actual work involved in this process can of course get much more complicated, but the general concepts remain the same. Now in this particular movie, we're going to look at the most basic form of multiple regression where all of the variables are entered at the same time in the equation.

It is after all the variable selection and entry that causes most of the fuss in statistics, and here's how it works. I'm going to be using the same Google Search data set that's similar to the marketing research people would be trying to do in terms of ways of determining the mind share of particular ideas in Google searches. What we need to do is go up to Analyze and then down to Regression, and we're going to go to the second choice here, Linear. Linear means straight line. It's going to try to put straight lines through the data, and what we need to do is get our one dependent or outcome variable, the thing that we're trying to predict.

I'll use interest in SPSS as a search term in Google, and then we pick the independent variables, those things that will be used to predict the levels. I'm going to use a bunch of other search terms from the Regression down through FIFA. I'm also going to use some dichotomous variables. Whether they have an NFL team, and NBA team or a Major League Soccer team. Put those in. Scroll down a little bit. The Percentage of the Population with a bachelors degree or higher, whether they have an outline for high school statistics, the Median Age.

Now in the Automatic Linear Modeling I was able to simply include a categorical variable of the Census Bureau region. It has four regions and that procedure, Automatic Linear Modeling, was able to compensate for the fact that we had four different categories of no particular order. In the Standard Linear Regression we can't do that. The predictors need to either be scaled variables, they can't be ordinal variables, or they need to be dichotomous, 01 indicator variables. Now when you have a categorical variable, you don't need the same number of indicator variables as you have categories.

The same way, for instance, to indicate gender as either male or female we only need one indicator. If we want to indicate four different regions in the United States, we only need three indicator variables, because if it's zero on all three of them, then the fourth category is implied. So I'm going to use these three indicator variables. Northeast, Midwest and South. I'm going to add those as well. Now let's come over for just a moment to Statistics and see if there is anything in here that we need for right now, and there isn't. There are times when having the R squared model change can be a very handy statistic, but we're using what's called Simultaneous Entry where we put everything in the model at once so there isn't a possibility of a change.

I'm going to hit Cancel. These are some diagnostic plots that we could get. I don't think we need any of those. If we wanted to save the predicted scores or other diagnostic statistics, we could do those with the Save menu. We don't need any of these for right now. Let's look at the other options. Now these are criteria that are used for entering and removing variables. Now we're not using an automatic procedure. We're simply entering everything at once. If we wanted to replicate the procedure that was used in Automatic Linear Modeling, we would use a Forward Stepwise Regression and then these criteria for entry would matter.

But now we're not going to worry about them. I'll just press Cancel now. And so really we're just using the defaults. I picked my one dependent variable, which needs to be scale variables, and then I put in a whole collection of independent variables, and now I'll press OK. And we get a bunch of tables out of this one. The first table, which indicates variables entered and removed, is not helpful. You can just ignore that. The second variable called Model Summary gives what's called the Multiple Correlation. The capital R in the second column tells you what the correlation is between all of the variables together.

It's an analog of the individual correlation, which is usually lowercase r. This is 0.937, which is a huge correlation, considering it goes from 0 to 1. The R squared, which is often a better indicator, because you can read it as a proportion of the variance in the outcome that could be predicted by the predictor variables, 88% is enormous. The next one, the Adjusted R squared, is also sometimes reported. You'll see that it's smaller. This has to do with the ratio of predictor variables to the number of cases.

Now truthfully, I've probably used more predictor variables than I should, because really I only have 51 cases, the 50 states in Washington, DC, but it still works for my purposes. The next table is the Analysis of Variance Table and that provides a statistical hypothesis test for whether the entire model as a whole can predict at better than 0%. And the answer of course is that yes. I'm looking at the number that's on the far right under Sig, where it says 000. If that number is less than 05, and this one isn't literally 0, it's just less than 001, then the model is statistically significant as a whole.

The table below that gives the actual regression coefficients. You have what are called Unstandardized Coefficients, which were in the original metric. So for instance, if it were years, that says for every year add this much more to your predicted value. If it were dollar, say for every dollar, then add this much to the predicted value. Now the Google Search terms, which are in quotes, those are already standardized ones, but if you go down to Has an NFL team or Has an NBA team. So the one that Has an NFL team is .068 and what that says is for a state that has an NFL team add .068 standard deviations to the prediction of their interest in SPSS relative to other terms in Google searches.

Next to those is the standard error, which is an indication of how spread out the variation is, and if you take the B weight or the regression weight and divide it by the standard error, you get to what's called a standardized coefficients or a beta weight. And those are actually really nice, because those are similar to correlations. They go from 0 to 1. They can be positive or negative and they indicate the degree of a linear relationship. Next to those are the T-tests. Those are individual inferential statistics for each one of the regression coefficients, and next to those is their significance level.

So we can go down to that column at the end, the Significance levels, and look for ones that are less than 05. We see for instance that Regression is a statistically significant predictor of interest in SPSS as a search term, so it's totally lost. And if we scroll down, we see that really those are only the two in that collection that do it. Now you may recall in Automatic Linear Modeling we had three or four that mattered, but that's because it used a different procedure where it was selective about what it entered and it also had a different criterion and we are seeing the overall changes in the information criteria.

This time we're just using probability values for individual regression coefficients. Now a really important thing here is the beta coefficients I said are like correlation coefficients. That's true to a certain point, but the big difference is that correlation coefficients are only valid on their own. Each correlation coefficient is calculated separately with the outcome. These, however, are only valid taken as a group; each one of these influences the other. So this can be very different from the correlation coefficients and it can be helpful to compare the two of them.

This is the most basic version of multiple regression. It doesn't have to be an impossibly complicated rocket science affair. Instead, it can serve a quick insight into what could be a large and very complicated data set. It can give you some real clarity to start with. The Automatic Linear Modeling function can do a lot of this and a lot more without too much direction from you, but there are situations where you would want to use the legacy command, and I especially find the standardized coefficients to be priceless, so I can compare them with correlation coefficients.

I recommend that you take a little time and see how SPSS's linear regression feature can help you deal with the complexities of your own data.

Show transcript

#### This video is part of

SPSS Statistics Essential Training (2011)

52 video lessons · 19977 viewers

Author

Expand all | Collapse all
1. ### Introduction

2m 58s
1. Welcome
1m 5s
2. Using the exercise files
40s
3. Using a different version of the software
1m 13s
2. ### 1. Getting Started

19m 0s
1. Taking a first look at the interface
11m 49s
7m 11s
3. ### 2. Charts for One Variable

21m 54s
1. Creating bar charts for categorical variables
7m 18s
2. Creating pie charts for categorical variables
2m 54s
3. Creating histograms for quantitative variables
5m 45s
4. Creating box plots for quantitative variables
5m 57s
4. ### 3. Modifying Data

33m 10s
1. Recoding variables
5m 33s
2. Recoding with visual binning
5m 33s
3. Recoding by ranking cases
5m 26s
4. Computing new variables
5m 37s
5. Combining or excluding outliers
5m 21s
6. Transforming outliers
5m 40s
5. ### 4. Working with the Data File

28m 12s
1. Selecting cases
6m 44s
2. Using the Split File command
5m 12s
3. Merging files
5m 33s
4. Using the Multiple Response command
10m 43s
6. ### 5. Descriptive Statistics for One Variable

22m 14s
1. Calculating frequencies
8m 43s
2. Calculating descriptives
5m 31s
3. Using the Explore command
8m 0s
7. ### 6. Inferential Statistics for One Variable

16m 3s
1. Calculating inferential statistics for a single proportion
6m 6s
2. Calculating inferential statistics for a single mean
5m 39s
3. Calculating inferential statistics for a single categorical variable
4m 18s
8. ### 7. Charts for Two Variables

30m 43s
1. Creating clustered bar charts
7m 10s
2. Creating scatterplots
5m 8s
3. Creating time series
3m 24s
4. Creating simple bar charts of group means
4m 17s
5. Creating population pyramids
3m 0s
6. Creating simple boxplots for groups
3m 3s
7. Creating side-by-side boxplots
4m 41s
9. ### 8. Descriptive and Inferential Statistics for Two Variables

45m 28s
1. Calculating correlations
8m 17s
2. Computing a bivariate regression
6m 27s
3. Creating crosstabs for categorical variables
6m 34s
4. Comparing means with the Means procedure
6m 33s
5. Comparing means with the t-test
6m 4s
6. Comparing means with a one-way ANOVA
6m 30s
7. Comparing paired means
5m 3s
10. ### 9. Charts for Three or More Variables

24m 30s
1. Creating clustered bar charts for frequencies
6m 34s
2. Creating clustered bar charts for means
3m 45s
3. Creating scatterplots by group
4m 13s
4. Creating 3-D scatterplots
4m 25s
5. Creating scatterplot matrices
5m 33s
11. ### 10. Descriptive Statistics for Three or More Variables

30m 57s
1. Using Automatic Linear Models
11m 52s
2. Calculating multiple regression
9m 3s
3. Comparing means with a two-factor ANOVA
10m 2s
12. ### 11. Formatting and Exporting Tables and Charts

29m 29s
1. Formatting descriptive statistics
6m 1s
2. Formatting correlations
7m 49s
3. Formatting regression
10m 19s
4. Exporting charts and tables
5m 20s
13. ### Conclusion

51s
1. What's next
51s

### Start learning today

Get unlimited access to all courses for just \$25/month.

Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold
Share a link to this course

### What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

### Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.

Exercise files

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ.

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.

Congratulations

You have completed SPSS Statistics Essential Training (2011).

Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

### Already a member?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

### Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

How to use exercise files.

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferencesfrom the dropdown menu.

• Mark video as unwatched
• Mark ALL videos as unwatched
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

## Are you sure you want to delete this note?

Thanks for signing up.

We’ll send you a confirmation email shortly.

• new course releases
• general communications
• special notices

Keep up with news, tips, and latest courses with emails from lynda.com.

• new course releases