Join Monika Wahi for an in-depth discussion in this video Scientific method review, part of Healthcare Analytics: Regression in R.
- [Narrator] Welcome to Chapter 1. Before we start working on creating our hypotheses for the regressions in the course, let's review the scientific method and how to apply it. This movie will review the scientific method, a main step in the scientific method is the formulation of a hypothesis. So we will basically discuss what components or ingredients belong in a hypothesis: which are subpopulation, exposure, and outcome. Before we proceed, I want to remind you of a slide I showed you during the prerequisite course in this series on descriptive analysis.
This slide was about two different types of analyses done with BRFSS data: descriptive and analytic. When we are talking about using the scientific method with BRFSS, we are basically saying we are choosing to do an analytic study not a descriptive one. And so when you do an analytic study, you are guided by at least one hypothesis you formed before embarking on the study. We will talk about forming that hypothesis now as part of the steps of the scientific method.
Here are the basic steps for the scientific method. First, we identify the problem. What do we want to study? Next, we gather the data. Okay, we already have an issue, right? That is because the BRFSS data are already gathered. In this case, rather than gather data, let's start by gathering some knowledge about the potential problems we can study with the BRFSS data set. Let's do that by opening up the questionnaire for the 2014 BRFSS Survey.
Here's the table of contents for the questionnaire. Notice in the questionnaire, that the core sections are listed at the top and the modules at the bottom. For this course, we will use the core only. Remember, you have to request the module data from individual states. Even though there are 18 sections, which look like a lot, there are really only a few questions in each section. Let's pick out something that we think is important then gather knowledge about it. Here is the first ingredient of our hypothesis. We need to define a subpopulation of people in the BRFSS such as Hispanics.
I am already aware that there are racial and ethnic variables that could be used to select only Hispanics from the data set. Because of my past experience, I know that their here in Section 8, the Demographics portion - let's go look at that. I also know from prior knowledge that Hispanics are important people to study because they have been shown to be a subpopulation that displays unique patterns and associations between different outcomes and diseases and different situations and behaviors. For the next ingredient in our hypothesis, we need to define an exposure, or something that you think causes, or protects from, or is somehow associated with, a disease or outcome.
And it has to be in the BRFSS. I know there's a question on regular exercise, so I put that in as an example of an exposure you can pick. Let's look at the questionnaire to see where we'd find this. Sure enough, here it is, in Section 4. Let's go look at it. It's question 4.1, and it can be then operationalized to define the exposure in the data. The hypothesis presents these ideas conceptually, but then the analyst has to operationalize these ideas in to some logical programming that can identify these concepts in the data.
Notice how if I had wanted to study diet as an exposure, I could not use this BRFSS data set. Do you see any section on diet in 1 through 18? Because there is no diet section, I could not choose anything to do with diet as an exposure. Some see that BRFSS is limiting because of this. On the other hand, the CDC uses the BRFSS to define the analytic space for risk factors. And they tend to pick what is important.
So if it's not on here, it's not on the radar of the nation's public health effort. If you want to do something disruptive, you will have to collect your own data. Finally, we will need to select a disease or outcome that we are hypothesizing is affected by the exposure. The exposure could cause the disease, like when we think of the exposure of smoking and the disease of lung cancer. The exposure can also protect against disease, as we might think in this case, where we are choosing engaging in regular exercise as the exposure and lower risk of diabetes as the disease or outcome.
I want to call your attention to Section 6 of the questionnaire, which asks a laundry list of chronic diseases. Diabetes is in here at the end. Section 6 is a good place to shop for an outcome. So in conclusion, we had a little Epidemiology 101 where I reviewed the scientific method for us. I also showed you how to shop in the BRFSS questionnaire for the components of the hypothesis. I gave you an example of choosing a subpopulation, an exposure, and an outcome.
- Dealing with scientific plausibility
- Selecting a hypothesis
- Interpreting diagnostic plots
- Working with indexes and model metadata
- Working with quartiles and ranking
- Making a working model
- Improving model fit
- Performing linear regression modeling
- Performing logistic regression modeling
- Performing forward stepwise regression
- Estimating parameters
- Interpreting an odds ratio
- Adding odds ratios to models
- Comparing nested models
- Presenting and interpreting the final model