In this video, learn how to run the PROC GLM code reviewed earlier and review the output. The ANOVA table, sums of squares, and F-test results are also reviewed. Identifying parameter estimates for both simple and multiple linear regression—including intercept, slope estimates, and standard error, t-value, and p-value for slopes in output—are covered as well.
- [Instructor] In this movie, we are going to run a little PROC GLM code and get familiar with reading the output. This code should look familiar. I was showing it to you in the last movie. It is in your exercise files for this movie too. It's called 505_linearmodel one and two. Okay here's our PROC GLM with the model statement specifying a simple linear regression model with our continuous outcomes SLEPTIM1 and our exposure for an independent variable DIABFLAG. Let's start by highlighting this, running it, and looking at the output. For now, let's not care what the results are and let's just look at what is being reported. You'll see here at the very top is the number of observations used in the GLM. There are a few reasons to check this. One is that if you have any missing data in the dependent variable, it won't be included in the regression. So, if SLEPTIM1 was blank in 100 records, this number would be 100 less than our total dataset and we might notice, if we haven't noticed by now. And another reason is you want to make sure you are analyzing the correct dataset. And just looking at the number of observations can clue you in as to whether you've loaded the correct dataset into the GLM procedure. Okay let's look at the next output table. This is the famous ANOVA table. Now you will remember why we did PROC GLM in the part one course, we needed to do in the ANOVA. At that time, we only paid attention to this table, and we basically ignored the rest of the output. In this course, we need to look at this table and also other parts of the output. If you remember your statistics 101, if you are doing a linear regression, you must first run an ANOVA and check the P value on the F test. If it is statistically significant, less than 0.05, then you can go on and interpret your linear regression. Look, our F test P value is significant. So we could actually interpret our regression results. Good to know. This next table is a little bit of a grab bag. You have R square, the coefficient of variation, the root mean square error and the mean. What I find most important about this table is the R square. That's your model fit statistic. Okay these two tables look identical but that's just because I only put one independent variable in the regression. Theses table are your type one sums of squares, and your type three sums of squares. When I'm doing linear regression on big data, these two tables fall in the category of I'm ignoring you because we don't really need the info in these tables. Here's the main event, the table we've all been waiting for. This last table is the one that provides the linear regression parameters. Let's look at the columns. Here is where you will see the value of the slope for the covariant. Actually, for the first line it's the value of the y intercept but the rest of the lines report slopes. That's what estimate means. So these other three columns are the standard error of the estimate, the T value on that estimate when tested against zero, and finally the P value on the T value. This P value is not important for the intercept but it is important for the independent variables in the model. We only have one here, so let's look at it. We see the slope is positive, and the P value is less than 0.05 so we'd say DIABFLAG statistically significantly explains the variation in SLEPTIM1 according to this unadjusted model. Let's go see what happens if we add more independent variables. Okay, let's try running this model where we have not only DIABFLAG, but male for the sex category and all the age indicator variables for the age category. Let's highlight and run, then look at the output. Already you see the output got way longer because we added all those independent variables. But the ANOVA table and the next table with the model fit statistics is still the same size. We have a statistically significant F test but that's normal for big data. I hardly can get an F test that's not significant with big data, no matter what the model. Let's scroll down past the sums of squares to the parameter estimates table at the end. See, now each independent variable has its own estimate, meaning its own slope. Some are negative, some are positive. And each independent variable has its own P value. Some are statistically significant, like DIABFLEG and AGE4, five, and six. And some are not like male, AGE2, and AGE3. Now that you have a feel for the PROC GLM output, let's move on to the next chapter where we use PROC GLM to answer our hypothesis through the development of a final linear regression model.
- Preparing for linear regression
- Creating plots for testing assumptions
- Linear regression modeling
- Interpreting the linear regression model
- Logistic regression modeling
- Presenting linear and logistic regression models
- Issues in regression