Join Monika Wahi for an in-depth discussion in this video Beginning Model 3, part of Healthcare Analytics: Regression in R.
- [Instructor] This movie will describe how we begin to use the forward stepwise modeling process to make Model 3. In this movie, I'll first start with a little review. I'll review our model specifications for Models 1, 2 and 3. I'll also review how the forward stepwise modeling process goes. And I'll review our approach to making decisions about what covariates to keep and which to discard as we proceed through that process. Then we'll begin our modeling. I brought back the slide from an earlier movie just to remind you what we are doing.
Our goal is to develop Model 3, the fully adjusted model. This will include all covariates that survive our forward stepwise modeling process. In the prerequisite course, we made all our candidate confounding variables. Our goal with our modeling process for Model 3 is to see which of those variables should be included in our final model. You've seen this slide before too. You'll notice I put done after the first two lines, as we did those. We already ran Model 1 and Model 2 and recorded the results on our metadata.
Now you will see us making Model 3. Our first step will be to figure out what model to start with. I would recommend starting with Model 2 but just removing the covariates that were not significant. Remember how male and some of the age variables were not significant. Also, drink monthly was not significant. So we can start by running a model with just those significant covariates. But then, after running that model, and documenting the results on our metadata table, we have to decide what do we add next? I'm usually guided by my data dictionary or my completed table one.
Let's look at those. See the variables we made here? For example, we have smoker. That's a variable we could try next. We could also try hispanic or a set of race variables. Maybe a more organized way to go about it is to refer to your completed table one, as we made in the last course. Let's review that. Table one will list the same confounders as the data dictionary, just in a different format. And also you can look at the results in table one to guide you.
The point is that whatever you choose to add next, you will make another model, run it, and look at the results. Then you'll document those results on the metadata table. Then of course you ponder what do I do next? Do I take out something I just added? Do I add another covariate? The rule is you have to keep the exposure in, but you want to remove the nonsignificant covariates. But it's your call, you can keep nonsignificant covariates if they are on the line. Let's walk through that process.
This is a long code file called 255_Linear regression modeling. You'll understand why it's so long after you run a few models, and I show you what we are doing. As you can see, I start after the read code with a summary of model two, which we just ran. It's still in the console, so let's just look at it. The results make me want to keep only drink weekly, our exposure and the age variables that were significant. I wrote code for that, and we'll run that to make Model three.
I also will run a summary command on it so we can look at it. Let's run those. Highlight, and Control R. Let's look at our covariates. Great, all are significant. Now let's look at how I updated our model metadata. See, I get chatty here. I list the covariates in the model. You can copy paste from your code instead of retyping. But then when I say which ones are significant, I lazily just put all. Let's say all are significant except one.
You can just say all except blah. Blah being whatever one was not significant. You can use shorthand. I didn't need to fill in the adjusted R-squared because I really didn't like the model much. But I did it anyway, just as a demonstration. What's more important to fill in are the comments. Here I explain that I'm just including significant covariates from Model 2. Let's go back to our code. Here's where the art starts. Now I chose to add smoker as the first covariate I'm trying to add to the model.
See how I add it here in Model four and then do a summary command. Let's highlight and run that. Let's look at our results. Hey look it was significant. Let's go put that in our metadata. Here it is. So now we have begun our forward stepwise modeling process. First I reminded you what we are doing, what models we are making, how the process works, and how we make decisions during the process. Then we started with a base model by keeping the significant covariates from model two and ran that, and documented it in our metadata.
Next we successfully added smoker to our model and documented that. In the next section we are going to keep going to finish our forward stepwise modeling process and produce our final model.
- Dealing with scientific plausibility
- Selecting a hypothesis
- Interpreting diagnostic plots
- Working with indexes and model metadata
- Working with quartiles and ranking
- Making a working model
- Improving model fit
- Performing linear regression modeling
- Performing logistic regression modeling
- Performing forward stepwise regression
- Estimating parameters
- Interpreting an odds ratio
- Adding odds ratios to models
- Comparing nested models
- Presenting and interpreting the final model