From the course: Introduction to jamovi

Binomial logistic regression - jamovi Tutorial

From the course: Introduction to jamovi

Binomial logistic regression

in analyzing data is classifying cases into one category or another based on a number of other variables you might have. So for instance with your computer is trying to decide whether a particular email is spam or not or you're trying to decide whether a particular person is likely to buy your product or not. And those are dichotomous classifications, or predicting dichotomous outcomes is to use what's called binomial, which means two names, binomial logistic regression. So it's a form of linear regression, but it's adapted for placing cases into one group or another based on the probabilities that are predicted from your other variables. Now this is really easiest to simply show how it works, so I'm going to use the data I have about state data, I've got a number of variables about personality characteristics on a statewide basis and search terms. But the dichotomous variable that I have in here is whether the state's governor, their current governor right now is Republican or a Democrat. Now let's start with just a little tiny bit of exploration so we know what we're dealing with. I'm going to take governor, put it over here, and we're going to get a frequency table, and we will also get a bar plot. And so at this exact moment of the lower 48 states in the United States, about 2/3 of them have Republican governors. And so we're going to see if we can use some of the other data we have in this dataset to classify states or predict which ones have Republican governors and which ones have Democrat governors. And so the way we want to do this, I'll just close that, is to come to regression and come down here to Logistic Regression, 2 Outcomes, or Binomial. Again, where binomial means two names or two categories. I'll click on that. The first thing we need to do is put our dependent variable, that's the outcome variable, the thing we're trying to predict, and that's going to be governor, and I don't even have to tell it this is what this means, this is what that means, because I actually have it written as words in the dataset. And the nice thing is jamovi's smart enough to be able to tell that those are categories. And then we get to pick some covariates. Now I could pick a lot here. Let's pick just the social media ones, just for fun. So I'm going to come down here to Instagram, Facebook, and retweet. By the way, the reason it says retweet is because Google Correlate wouldn't let me search for Twitter. I don't know why, but since retweet it's exclusively a Twitter word it seemed like a good substitute. So I'm going to put those all into covariates, and those are the three variables that I'm going to use together to try to predict which states have Democrat governors and which state shave Republican governors. And you can see right here we've got a model that's actually working pretty well. We've got three variables. We've got the Intercept which is not zero, Instagram is not statistically significant, Facebook is, and retweets nowhere close. But there's a lot of other information we can get through the options that we have in jamovi. So let's just take a quick look. Let's take a look at Model Builder. And this is how we want to enter things, and we can put things in blocks if we want to. I don't feel the need to do that in this case, but I showed how that works in the video on linear regression. Let's look at Reference Levels. And one of the categories needs to be taken as the baseline and we're trying to predict a change to the other category. Because there are more Republican governors let's just have Democrat as the baseline and we'll go up from there. Assumption Checks. Colinearity is an issue especially 'cause I'm pretty sure that these three social media terms as Google search terms are related. And from that we get both the VIF, the Variance Inflation Factor, and the tolerance, and there's indications here that we've got some colinearity. with binomial logistic regression is the odds ratio. And it's also nice to get a confidence interval, and that's going to add a few columns onto this table right here. You see we've got the odds ratio right there. Now, in so many other statistics, zero is the base value and things either go positive or negative. But the nothing's happening value for an odds ratio is one. That means a one-to-one ratio. And it goes below one but it doesn't go all the way down to zero. Can't get all the way down there. It can go up from there. And so you see for instance that the Intercept, well the Intercept is reliably above zero, but you can see this odds ratio, both of these numbers are above one. This one went below one, went up. These, both of them are above one. They're on the same side. And so these give us an idea of the variables that predict the odds of a particular state having a Republican governor based on these three social media variables that we have from Google Correlate. If we come down a little further we can go to estimated marginal means. And so for instance, Facebook was significant within the context of these three predictor variables. Let's take that and stick it into here for marginal means. And it actually gets a nice curved chart that goes with it. If I come down here for a moment. And what this shows us is the probability of a state having a Republican governor going from zero to one, that's like 0% to 100%. Based on the Z score that state has on searches for Facebook on Google. And you can see here that when states search less than other states for Facebook they're less likely to have a Republican governor, but when they search more they are more likely to have a Republican governor. And so this is a nice way of looking at the affect of that because binary logistic regression does work on a curves system where it's drawing this probability changing over time. This one shows you just the one variable. It actually uses all of the variables together to calculate probabilities. But another nice thing about categorization tasks like this is you can get a classification table. So I can click on that one right here, and what it's going to tell me is what it predicts the states will have versus what they actually do. And what's interesting, too, is you can change the cutoff value. So for right now it's only getting 40% of the Democrat states correct. It's getting 91% of the Republican ones. But let's take a quick look at what's called a cut-off plot, which gives us a chart of what's called specificity and sensitivity. You can think of sensitivity meaning like it's very likely to set off an alarm or give you the answer if it believes it has a Republican governor and specificity means it's going to do that only if it does. And in many situations these two lines, this is specificity going up here, and this is sensitivity going down, often they cross right here at the 50% point. But these ones are a lot closer to .7. So what I'm actually going to do is I'm going to change the cut-off from .5 to .7, and do that right here. And you'll see that it changes the way the classification table works, because now it's going to say, well, only put them as Republican if they have over a 70% chance of being Republican. And that actually makes sense, too, because it's about 2/3 Republican governors, slightly over, in the country as a whole. You see this cut-off line's a lot closer to where these two crossover is usually where you want to get. It's 'cause it's going to be maximum utility there. And it's changed the classification table. Instead of 40% we now have 73% correct for the Democratic governors and we've gone from 91 to 73, but now it's sort of balanced out in terms of how accurate it is for the two different conditions. And so these are some of the methods And so these are some of the methods that jamovi gives you for looking at the relationship that jamovi gives you for looking at the relationship between several predictor variables between several predictor variables like the three social media search terms like the three social media search terms and how they can be used to predict the classification and how they can be used to predict the classification of a dichotomous outcome like Republican of a dichotomous outcome like Republican or a Democrat governor or any other time or a Democrat governor or any other time you have two distinct outcomes you have two distinct outcomes that you're trying to predict. that you're trying to predict.

Contents