From the course: SAS Programming for R Users, Part 1

Demo: Bayesian Logistic Regression - SAS Tutorial

From the course: SAS Programming for R Users, Part 1

Demo: Bayesian Logistic Regression

- [Instructor] Now before we exit SAS Studio, I want to show you a few demonstrations of what I'm working up to in this series. So what I hope you'll be able to do by the end of this series. So I want to show you some more advanced programs. I'm not going to talk through the details of the code, but again, just showing you what we're going to be working up to. So under the Files and Folders tab, navigate to your data and programs, and open up the spr401d01.sas file. In this program, I have four separate demonstrations. The first one I'm going to be doing a Bayesian Logistic Regression. And I'll be using a common dataset, the low birth weight babies dataset. So of great concern to doctors are babies being born with low birth weights, which are classified as 2500 grams or less. Why are they of great concern? Because they have greater health problems. So what we want to do is go ahead and create a model to actually predict when a baby is born with a low birth weight. And the first piece of code you notice is a format procedure this helps us alter the display of observations in our data table, but we'll get to that in chapter two. One of the things you notice though is that this procedure has five unique statements. How do I know? I can just go ahead and count the semicolons. Next I have my commented out code, again, anything between the /* and the next */ is going to be commented out. And here is just a description of the data that I'm going to be reading in in just a minute. So I have the identification code, which is going to be ID in my dataset. I have low birth weight, which I'm calling LOW in my dataset. In this case we're specifying the low birth weight as a value of one, and a baby being born with greater than or equal to 2500 grams in birth weight will classify as a value of zero. We also have the age of mother in years, weight in pounds of the mother at the last menstrual period, the ethnicity, the smoking status, did the mother smoke during their pregnancy or not. History of premature labor, history of hypertension, yes or no, presence of uterine irritability, the number of physician visits during the first trimester, and here we have the original variable Birth Weight in Grams. So the low variable was actually created from the final variable here, BWT, Birth Weight in Grams. The low birth weight, again, just represents a binary variable. And we've already talked a little bit about PROC Steps. We also are going to use Data Steps in SAS. PROC Steps in general, of course, analyze data. Data Steps are going to be used to actually read in data, alter data, add new variables, subset data, manipulate your dataset, manage your dataset, whatever you need to do. And after data, when I'm reading in new data, I'm going to specify my library, and the new dataset name that I'm working with. So this dataset is going to be called birth. And then here I have a bunch of different statements, input, if, label, format, and so on, which I'll talk about later in the series. And I'm going to highlight this entire Data Step, again, this is reading in my birth weight data, which you can see here in yellow. After the run statement, I'm going to run the data so I have access to use it. So this dataset has 189 observations, and 11 total variables. Now the first thing you're going to do when you read in your dataset, you're going to want to go ahead and identify some summary statistics. Here I'm going to be using the univariate procedure to look at a histogram, and analyze that original birth weight variable. I'm also going to use the frequency procedure to look at the classification variables. Low, smoke, hypertension, and premature labor. So in this dataset, the mean birth weight is about 2900 grams, the median is also about 2900 grams. It's slightly skewed here by the histogram. I also generated a Q-Q Plot to test the assumption of normality. And again there's slight tails in the plot. And here we have a few tables from the freq procedure. So in this dataset, we have 59 babies being born with a low birth weight, 130 being born with a birth weight greater than 2500 grams. 74 mothers smoked during their pregnancy. 12 mothers had a history of hypertension. And 30 had a history of premature labor. Okay, once you go ahead and identify some summary statistics and plots, you're then going to ahead and apply some advanced procedure to create your model. I'm going to be doing a Bayesian Logistic Regression, and that's completely fine if you're not familiar with Bayesian analysis. I just wanted to show you one quick example of an advanced procedure. So here I'm using proc MCMC, which stands for Markov Chain Monte Carlo. We're working with the birth dataset that we just ran in. I'm then specifying my parameters and priors which is specific to Bayesian analysis. Again, I'm doing a logistic regression. And in my model I'm specifying the variables smoke, hypertension, weight of the mother at the last menstrual period, and premature labor. So I'm only using four variables to try to predict when a baby is being born with a low birth weight. And of course, I could've added in the rest if I wanted as well. But let's run this procedure to get back some results. Okay, here we have the output, the parameters, beta0 through beta4, my prior distributions that I placed on them. Posterior Summaries, the mean or estimate of my coefficients. We also have the Posterior Intervals as well. And it's important to understand the autocorrelations, so here we have estimates of the lag for the coefficients. And one of my favorite things about SAS is it's always generating relevant graphics for you. So because it knows I'm doing a Bayesian analysis, it automatically goes ahead and gives me the following plot. So here I have the trace plot of my coefficients, so those are the actual samples from my posterior. The autocorrelation by lag plot, and also the posterior density of each coefficient. In this case, I'm just looking at beta0. It gives those graphics to me automatically. And if I scroll down, I'll see the other graphics for the other parameters, beta1 through beta4. And of course at this point, I'd want to follow up with this model to see if it actually predicted well. Maybe I wanted to score some new datasets, add in more variables for prediction. But again, just wanted to show you one instance of an advanced procedure. To sum up what we did, we read in the data with a datastep, we then analyzed it with some lower-level procedures, we found some summary statistics and some basic plots, and finally, we used the MCMC procedure to do a Bayesian Logistic Regression model.

Contents