Join Monika Wahi for an in-depth discussion in this video Choices of modeling approaches, part of Healthcare Analytics: Regression in R.
- [Instructor] In this movie, I will explain the different choices we have as we approach our regression modeling. First, I have to give you a bunch of information before we dive into modeling. I need to explain to you that we essentially have a choice of three philosophical approaches: forward stepwise, backward stepwise and ambidirectional. Even though what I'm going to demonstrate to you in this course is forward stepwise, I'll explain what the others mean and why I didn't choose them for this course.
So in regression approaches, there are three main ways people model: forward stepwise, backward stepwise and ambidirectional stepwise. So what does that literally mean? Remember your challenge, you have an exposure, you have an outcome and then you have a grab bag of confounders. How do you decide which confounders fit in there and which ones should not be included? If you do forward stepwise, you do that by running models one at a time and each time either adding a new variable or taking out a variable.
For example, if you try a variable and it does not fit, you throw it away and try a different variable next model. If it fits, you keep it and add a new variable in the next model. It's like you are going forward building a model. Then there is backward stepwise which I don't like and we are not doing. In that situation, your first model would have every single covariate in it. Then each model, you'd remove the one that fit the worst until you get to a model where they all fit.
Then finally we have the watery, vague, loosey goosey idea of ambidirectional stepwise, meaning both directions. You put some in one iteration, you take some out the next iteration and it's art. So how popular are these approaches? I am a forward stepwise fan and I really have no one on my side I'll admit. Everyone seems to want backward stepwise and I can't figure out why. There are a lot of issues with backward stepwise. First, if you try to put everything in the first model, it can break the software due to small cells.
Also, it's much harder to decide which of the variables to take out. In forward stepwise, you add one and if you don't like it, you take it out. But if the model starts with all this clutter, which one do you want to eliminate before the next round? It's hard to decide. At the end of forward stepwise modeling, I really have a feel for the data, but I don't really get a feel for the data from backward stepwise. Theoretically, the data are the data. Whether you start with forward or with backward, you should meet the same model in the middle given the covariates you have.
So it doesn't matter to your final product, it just matters for the process. And I find the forward stepwise process both easier to use and easier to document. Now for the dirty secret about ambidirectional. Actually, both forward and backward stepwise is a little ambidirectional. That's because in forward stepwise, once I get to what I think is my final model, I try to run models to put back the covariate I took out to see if they fit now. It's kind of like saying you can't fit a scarf in your suitcase so don't pack it, but then after you pack everything else, you think you can shove it in the side so you try to shove the scarf back into the suitcase.
It's like that. I think I have a good fitting model, but I try to shove the covariate back in that I just kicked out just to make sure they don't fit because it would be nice to have the scarf with you on vacation, right? In backward stepwise, we technically get ambidirectional too because those people also try to do the same thing I do. Shove the covariates that didn't survive the modeling process back in one at a time just to make sure they don't fit. So this is the dirty secret. At the end of the day, to finalize a model, the forward stepwise and the backward stepwise get a little ambidirectional, but we really aren't allowed to say that.
In the method section, we either did forward stepwise or we did backward stepwise. We don't need to admit we got ambidirectional at the end. So this movie talked about our three main approaches: forward stepwise, backward stepwise and ambidirectional. I talked to you about the problems with choosing backward stepwise in my experience, even though it's more popularly talked about in the scientific press. There are just logistical problems with starting with a model loaded up with all your covariates and trying to Jenga them out as I say in reference to the popular party game.
Instead, I prefer forward stepwise because it helps me understand how I arrived at my model because I built it up from a few variables and kept adding more variables. I'm just more psychologically comfortable with that approach. Theoretically, all analysts should be able to meet in the middle with a well-defended model whether they go backwards or forwards. And as I said, people say ambidirectional, but all that alludes to is the fact that at the end of modeling, sometimes forward people do backward steps and backward people do forward steps. It's like dancing.
You're supposed to have fun and not get hung up on these details. So let's get modeling.
- Dealing with scientific plausibility
- Selecting a hypothesis
- Interpreting diagnostic plots
- Working with indexes and model metadata
- Working with quartiles and ranking
- Making a working model
- Improving model fit
- Performing linear regression modeling
- Performing logistic regression modeling
- Performing forward stepwise regression
- Estimating parameters
- Interpreting an odds ratio
- Adding odds ratios to models
- Comparing nested models
- Presenting and interpreting the final model