From the course: SAS Essential Training: 1 Descriptive Analysis for Healthcare Research

PROC TABULATE for categorical analysis - SAS Tutorial

From the course: SAS Essential Training: 1 Descriptive Analysis for Healthcare Research

Start my 1-month free trial

PROC TABULATE for categorical analysis

- [Narrator] I promised you that I'd show you how to make a Table 1 entirely within SAS and without using Excel. And we are going to do this now in Proc Tabulate. If you are following along, please open your exercise file 400_Proc Tabulate for Categorical. First, I want to point out that I did not put all of the independent variables into this table because the default output is fairly spacey, making it very long unless you add options. Second, I want to let you know I included a white paper in your exercise files called "Proc Tabulate and the Neat Things You Can Do With It" by Wendi L. Wright. So you can better understand Proc Tabulate if you want, and do some exploring on your own. Next, let's dig into this code. First, you have to do a data step. Why? Because you have to attach labels to the variables and you can do that in the data step. The reason is, these labels will show up on your Proc Tabulate output. You'll see we are reading in the data set "Analytic" and creating the data set example, which will then have our labels on it. The syntax of the label command is just the word "label" and then an "=" and then whatever you want to label it in quotes. Next, you have to add something SAS calls "formats" to each level of the categorical variables. In SAS software, format literally means labeling the levels of a categorical variable. Let's look at our first format. As you can see, we declare Proc format, then on the next line we create a value we are going to call "Asthma_F". SAS users often use this naming convention, where the format for the variable is named like the variable name, only with an underscore F after it. Notice no semicolon at the end of the line. To specify the values of each format, we put the number, then an "=", then what we want to label the level in quotes. In our Asthma variable, remember 1 was "has asthma" and 2 was "no asthma". And those were the only levels. Then you'll see I put the semicolon on the next line before the next format to keep my code neat. Please notice, even though our variable that this format fits with is named "Asthma 3", the format is not actually attached to the variable in this step. Just the format "Asthma_F" is created. You'll see I create the formats for the other variables on our list. And then, we have a run command at the F. Let's run both our data step to attach the labels and the Proc format to create the formats. It feels like nothing happened, but let's look at the log file. Actually, everything happened correctly. We just don't have any output yet. Let's go back to our code. Okay, here is our Proc Tabulate code. First, we call the proc and specify the data set. Then, we attach the formats. See? We use the format command, then state the variable, and then state the format, followed by a period. You'll see we do this for all the variables before the next semicolon. Remember the class statement? That tells SAS which categorical variables we want as rows in the table. That has a semicolon at the end. This is by far the most difficult syntax for me to explain. It basically specifies how the table should come out. Remember how we start with an all row? Where it says All Diabete 4_age_g, sex in parentheses. We are listing what rows we want and in what order. Next, after the comma, in parentheses, we list what columns we want in what order, which are "All" and then, both asthma levels. Finally, after the asterisk, we list what calculations we want in the cells. In our case, we want "n", and then "column percent". The "f=4.1" is formatting the column percent. Let's run this code and look at the output. Here you go. I included this output in your exercise files for this movie. As you can see, it's a little different than what we made in Excel, and there are still things a snob like me might change. I'm sure there's a way to change this label, for example. And there's probably a way to add a comma for frequencies over 999. Proc Tabulate is very flexible. But you have to be patient adding options and formatting. If you are interested in that, please take a look at the white paper I mentioned that I included in your exercise files. All right, next, let's look at Proc Tabulate for a continuous descriptive table.

Contents