In the second round of stepwise selection in logistic regression, covariates that did not survive round 1 are tried again in the model iteratively. The code demonstrated shows several improvements made to the round 1 working model prior to settling upon a final model. Formatting syntax for the final model is demonstrated.
- [Instructor] Hi there, see this code on the screen? It was the code I was using at the end of the last movie. It's called 605_Stepwise Selection Logistic Round One, and it's in your exercise files. I just wanted to revisit it one last time so we can go over my final round one working model. See what I have in this model? Let's go through it. I have Diabflag Male Othrace, that's other race, then this variable Somecoll, which means some college. The reference group for this variable then includes all other educational levels. On the next line, we have Inc one, Inc two, Inc three and Inc four. Those are income levels. The levels that dropped out of the model then are in the reference category. We have obese, that's one of the levels from our body mass index categories, we have smoker, our smoking status variable and we have two levels from our general health variable, fair health and poor health. Okay, let's start modeling round two, so we'll go to the next exercise file. See the name of this file? It's 610_Stepwise Selection Logistic Round Two. And see at the top? Our first model, model 23, looks exactly like the last model except, look at that last line. Why do you think I put age two on the last line? Well, remember how we kicked out all the age indicator variables in round one. Now they get their time in the spotlight again. But because I already have a big bulky model, I'm going to try to add back each indicator variable one at a time. So I am trying to squeeze age two into my working model first. Let's highlight and run this model and see how that goes for me. Wow! Our model is getting big. Let's look at the end of it where we just added age two. Well, this is greater than 0.05 clearly. It's over twice 0.05, so that's not statistically significant. So let's kick out age two and try to add age three instead. Let's go back to our code. Actually, let's just see what I did in this code. You can see me here in model 24 trying age three, but I guess it didn't stay because then in model 25, I try age four without age three. Here's model 26. I wrote age five included in new working model. I guess age five was significant. Let's highlight and run model 26 and look at it. Here is the P value on the parameter. Sure enough, all these variables are significant. Let's keep age five in the model from now on. Age five, you luck dog! You made it back in. Let's go back to the code. Now age five got lucky but if we add covariates to subsequent models, and those covariates affect the model such that age five becomes not significant then it won't say. It's pretty cutthroat in the second round of stepwise selection. See, in model 27, I'm now keeping age five but trying to add back another age covariate. In fact, you can see by the comments that it takes me until model 32 to decide on keeping age six and not age five. And then after I add age six, I want to remove smoker which was in the original working model. Let's highlight and run model 32, and see why I wanted to remove smoker. See these P values? Notice how age six is very significant but smoker is totally not significant anymore. This is just an example of the ups and downs of logistic regression modeling. You really do not know where you are going to end up until you are done. Let's go back to our code. Okay, this code is my example of how I got to my final model. And you can piece together my story with my comments. Let's go to the very end of the code. Wow! See this? I ran 89 models to get to my final model, and model 75 is my final model. I encourage you to try this whole exercise yourself and see if you come up with the same model I did or not. Where the big difference will lie is in the interpretation, but for now, we will just want to practice using our stepwise selection process to arrive at a final model. It's important to keep track of all of our models, all 89 of them in my case. So in the next movie, we'll talk about updating our model metadata.
- Preparing for linear regression
- Creating plots for testing assumptions
- Linear regression modeling
- Interpreting the linear regression model
- Logistic regression modeling
- Presenting linear and logistic regression models
- Issues in regression