Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member

Using Automatic Linear Models

From: SPSS Statistics Essential Training (2011)

Video: Using Automatic Linear Models

In the last section, we looked at ways to chart the relationship of three or more variables at a time. In this section, we'll look at ways to give precise numerical descriptions to those relationships as well as inferential tests to check the reliability of our numbers. The very first procedure that we're going to cover here is one of the most impressive features that SPSS has added for version 19. It's called Automatic Linear Modeling. It's a huge step towards making data analysis a little easier, a little more accurate, and a lot more interpretable for a lot more people.

Using Automatic Linear Models

In the last section, we looked at ways to chart the relationship of three or more variables at a time. In this section, we'll look at ways to give precise numerical descriptions to those relationships as well as inferential tests to check the reliability of our numbers. The very first procedure that we're going to cover here is one of the most impressive features that SPSS has added for version 19. It's called Automatic Linear Modeling. It's a huge step towards making data analysis a little easier, a little more accurate, and a lot more interpretable for a lot more people.

Don't worry if you have an earlier version of SPSS. I'll also show you how to accomplish the same goals using procedures that are available in every version of SPSS in the next video. The goal of SPSS's Automatic Linear Modeling function and linear regression in general is to have an entire group of predictor variables. This can be scale variables, or ordinal, or dichotomous indicator variables. That's the 0/1 variables. You can even use multiple group categories if you break them down into a series of dichotomous variables.

But the goal of linear regression is to take these predictors and find the best way to combine them to predict values on a single scaled outcome variable. While the mathematics behind this can get very involved and there are plenty of decisions that can be made, the Automatic Linear Modeling procedure has been developed to keep most of that in the background and to let you focus on interpreting your data. This is how it works. To get to the Automatic Linear Modeling, we first go to Analyze, then down to Regression, and then over to Automatic Linear Modeling, which is the first choice.

From this, SPSS takes the information that we gave it about the variables about whether they were predictors. That is, they were input variables or whether they were targets or whether they were both. So this is a situation where the role that we gave a variable in the dataset makes a difference in how things work out. The first thing we need to do is pick our target variable. I'm going to use searches for the term SPSS. That will be my target variable. Now, it's going to ask me what I want my predictor variables to be.

I'm going to add a bunch of these ones about other searches in Google. I can put those in here. I can leave those in with the other indicators about whether they have an NFL team, or an NBA team, or a Major League Soccer team. I can have this information about Census Bureau Region. I'm going to remove these four about Census Bureau Division, because that's just subcategories of the region. So I'm going to remove that. Then these three, Northeast, Midwest, and South, are indicator variables that I use for the region.

However, the nice thing about Automatic Linear Modeling is you can put categorical variables with several categories in them and it will break them up in a way that makes best sense for the data. So you can leave categorical variables in there as they are. I don't need these dichotomous ones as a backup. So this is the list of potential variables that I can use as predictors, to try to get the relative importance by a state of SPSS as a search term in Google. I'm then going to come up here to Build Options.

It has been our objective and we have a creative standard model. That's what we're going to do. The other ones that are called Boosting, and Bagging, and the Large Datasets, those are technical things that we don't need to worry about. However, I am going to come to Basics, and this is asking me whether I want it to automatically prepare data and truthfully, this is a wonderful thing. It's a great way to deal with outliers and to transform variables and to make substitutions and it's one of the big perks of the Automatic Linear Modeling approach. The next thing I'm going to go to is Model Selection.

This is where things can get very complicated in regression. It's asking the Model Selection Method. That is, how it decides which variables to put into the regression model. I have several options. Forward Stepwise. I'll say one that says just put them all and then leave them there, and another one called Best Subsets. Now, when we get to the Linear Regression Command that's separate from this one, you'll see that we have some different options. I'm just going to leave this at Forward Stepwise, because it can make life a little bit simpler.

There is also an issue here about what criterion it wants to use. There are several choices here. The AICc, there is also the F- statistic, and adjusted R-squared. Let's not worry about that. Let's just use the Information Criterion. Then we can ignore these other options, and then these ones are about Ensembles and about Advanced, we can just ignore. So the last thing I need to do is going to go to Model Options and we don't need to worry about these options. We can just leave the defaults here. So now we can come down to the bottom and we can press Run to see what it gives us.

Automatic Linear Modeling produces this one small chart and it doesn't look like a huge amount, but this is a Model Viewer. When you click on it, it's interactive and it does a lot of other things. So I'm going to double-click on this to open up what's called the Model Viewer window. Maximize that. What you see here is first it says what's the target variable, the thing that we're trying to predict, and that is SPSS and its relative importance as a search term in Google on a state-by-state basis.

The Model Summary also tells us that it's using automatic data preparation and it's using a Forward Stepwise model selection method for deciding which variables go into the model. Now, the bottom one the information criterion has a number. That's not really inherently meaning in and of itself, but the lower the number, that is, we have negative numbers, so the greater the absolute value of the negative number, the better the prediction. Beneath that, where you show that we're able to predict about 79% accuracy in this model. So that's good.

What I'm going to do now is I'm going to come over to the little list of thumbnails on the left and start going through these one at a time. That's the one we're at right now. The second one shows what the Automatic Data Preparation did and what it is, is that we have a lot of outliers and what it's done is it's trimmed the outliers. Actually, it didn't really trim them, because trimming means throwing away that data. Instead, technically what SPSS did is something called Winsorising where it takes the outliers scores and simply replaces them with the highest or lowest non-outlier scores.

So it brings them in. This is a non-uncommon practice in business setting, so it's a nice way to do it. Also, when we have categorical variables like the Region, SPSS is able to merge categories in a way that maximizes their predictability. So that's a nice thing. So that's what the Automatic Data Preparation has done. The third window shows us what's called Predictor Importance. Predictor Importance is actually a rather sophisticated statistical calculation.

There are a number of things that go into it. It's not just a matter of probability values. It's not just a matter of correlations with the outcome. There is much more to it than that. But the relative importance is a very easy thing to understand. What this is telling us is that there are three variables that have a lot of importance in explaining the levels of relative interest in SPSS as a Google search term. The first is the use of Regression as a search term. That's not surprising, because that's a major thing that SPSS is used for.

The second one amazingly is Totally Lost, which seems to show up a lot with SPSS. The third one is the percent of population with a Bachelor's degree or higher. So these are the three major variables. We're going to have more about those. The next chart is the Diagnostic Plot. It lets us know the observed value of SPSS interest for each of the 51 states in Washington, D.C., along with its predicted value. The idea here is that they should stay close together, that the observed and the predicted should be pretty close. Otherwise we don't need to worry about this.

This is a histogram of Residuals. That's how far off the predictions were. Again, if we had a thing that looked really unusual here like a big spike at one end or the other, we might have a problem, but we're not going to worry about this one. I'm going to scroll down a little and I'll go to the next little page. This is a list of particular outliers and it tells us what their score was. For instance we had one place that had a score on SPSS of 3.364 and what that means is that state showed a relative interest in SPSS as a Google search term that was 3.364 standard deviations above the national average.

There is another measure that's related called Cook's Distance and this doesn't necessarily mean that these were outliers in this absolute sense, but they are the most extreme cases. The next one down is a graph of the effects of various predictor variables. We have Regression as a search term but transformed because it's removed the outliers and then Totally Lost and then Degree was also transformed by removing outliers. This is a Diagram View. You can also get a Table View and you can even expand this to see the various terms.

If you need an analysis of variance table for whatever purpose, here it is. I'm going to skip over to the next box and here we have coefficients. The coefficients are the actual numbers that you use to multiply things by. The Intercept is in there and then we have Regression, and Totally Lost, and Degree. Please note the Degree 1 is a different color because it's a negative coefficient. This would become clearer if we come down and instead of having the diagram we look at the table. Here, we can now see the coefficients.

The Intercept, that is the standard value that we give to everybody, is 0.87. So we assume that a state is 0.87 standard deviations above the mean in their interest in SPSS. Then for every standard deviation above on Regression, we add another half of standard deviation. For every standard deviation above on Totally Lost, we add a little over a half 0.58. On the other hand, for every percentage point of the population that has a Bachelor's degree or higher, we subtract 0.03 standard deviations, and so this is another way of looking at the relative contribution of the variables.

I am going to scroll down a little further. We have another one here that gives estimated means charts and these are straight lines, because these are just the slopes of the lines that we give in the coefficients. I don't think there is anything terribly important there, so I'll skip to the next one. This is a table that shows us the three variables that got included and then across the top is the information criterion and you can see that the number goes down. It charts at -52 and when they add Totally Lost, it goes to -73. Now, it adds Degree. It goes down to -75 and that was the criterion for deciding whether to include a variable, is whether it lowered the value on information criterion.

The very last thing is just a quick summary. You can click on to see what got included and what the options were. Just a quick written summary of the entire model. So the Automatic Linear Modeling function in SPSS is a fabulous option for those who want to make a sophisticated analysis and have thorough reporting options without having to make a million decisions on their own. It makes it much, much easier to sift through a large dataset and see what useful patterns might emerge.

I encourage you to spend some time to check out all of its options because there is more than I've covered here and explore how it might be able to help you in understanding your own data.

Show transcript

This video is part of

Image for SPSS Statistics Essential Training (2011)
SPSS Statistics Essential Training (2011)

52 video lessons · 20152 viewers

Barton Poulson
Author

 
Expand all | Collapse all
  1. 2m 58s
    1. Welcome
      1m 5s
    2. Using the exercise files
      40s
    3. Using a different version of the software
      1m 13s
  2. 19m 0s
    1. Taking a first look at the interface
      11m 49s
    2. Reading data from a spreadsheet
      7m 11s
  3. 21m 54s
    1. Creating bar charts for categorical variables
      7m 18s
    2. Creating pie charts for categorical variables
      2m 54s
    3. Creating histograms for quantitative variables
      5m 45s
    4. Creating box plots for quantitative variables
      5m 57s
  4. 33m 10s
    1. Recoding variables
      5m 33s
    2. Recoding with visual binning
      5m 33s
    3. Recoding by ranking cases
      5m 26s
    4. Computing new variables
      5m 37s
    5. Combining or excluding outliers
      5m 21s
    6. Transforming outliers
      5m 40s
  5. 28m 12s
    1. Selecting cases
      6m 44s
    2. Using the Split File command
      5m 12s
    3. Merging files
      5m 33s
    4. Using the Multiple Response command
      10m 43s
  6. 22m 14s
    1. Calculating frequencies
      8m 43s
    2. Calculating descriptives
      5m 31s
    3. Using the Explore command
      8m 0s
  7. 16m 3s
    1. Calculating inferential statistics for a single proportion
      6m 6s
    2. Calculating inferential statistics for a single mean
      5m 39s
    3. Calculating inferential statistics for a single categorical variable
      4m 18s
  8. 30m 43s
    1. Creating clustered bar charts
      7m 10s
    2. Creating scatterplots
      5m 8s
    3. Creating time series
      3m 24s
    4. Creating simple bar charts of group means
      4m 17s
    5. Creating population pyramids
      3m 0s
    6. Creating simple boxplots for groups
      3m 3s
    7. Creating side-by-side boxplots
      4m 41s
  9. 45m 28s
    1. Calculating correlations
      8m 17s
    2. Computing a bivariate regression
      6m 27s
    3. Creating crosstabs for categorical variables
      6m 34s
    4. Comparing means with the Means procedure
      6m 33s
    5. Comparing means with the t-test
      6m 4s
    6. Comparing means with a one-way ANOVA
      6m 30s
    7. Comparing paired means
      5m 3s
  10. 24m 30s
    1. Creating clustered bar charts for frequencies
      6m 34s
    2. Creating clustered bar charts for means
      3m 45s
    3. Creating scatterplots by group
      4m 13s
    4. Creating 3-D scatterplots
      4m 25s
    5. Creating scatterplot matrices
      5m 33s
  11. 30m 57s
    1. Using Automatic Linear Models
      11m 52s
    2. Calculating multiple regression
      9m 3s
    3. Comparing means with a two-factor ANOVA
      10m 2s
  12. 29m 29s
    1. Formatting descriptive statistics
      6m 1s
    2. Formatting correlations
      7m 49s
    3. Formatting regression
      10m 19s
    4. Exporting charts and tables
      5m 20s
  13. 51s
    1. What's next
      51s

Start learning today

Get unlimited access to all courses for just $25/month.

Become a member
Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ.

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Join now "Already a member? Log in

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed SPSS Statistics Essential Training (2011).

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

Get started

Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferencesfrom the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Are you sure you want to delete this note?

No

Your file was successfully uploaded.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.