From the course: SAS Essential Training: 1 Descriptive Analysis for Healthcare Research

Removing rows by categorical criteria - SAS Tutorial

From the course: SAS Essential Training: 1 Descriptive Analysis for Healthcare Research

Start my 1-month free trial

Removing rows by categorical criteria

- [Instructor] You should have a code file in your exercise files called 115_Apply Categorical Exclusions. See, it's the one I have open here. So the first records we want to exclude or remove from our dataset are rows for respondents who are not veterans because we are trying to study veterans. In our documentation, one in the VETERAN3 field means yes, the respondent is a veteran. See how we read in r.BRFSS_b in the set statement then read out r.BRFSS_c in the data step. That way if we ever need to go back to those records we excluded, we know we can roll back to BRFSS_b. Now here's the trick, adding this if statement, if VETERAN3 equals one adds the criterion on which SAS will rely to choose what rows to keep in the resulting data set r.BRFSS_c. Let's highlight and run this code then look at the log file to see how it went. The log file can be very interesting. There were 464,664 observations read from the data set R.BRFSS_B. Remember that was in the set statement. Then it says the data set R.BRFSS_C has 62,120 observations and 17 variables. This sounds consistent with what we know about the VETERAN3 variable. Remember how there were a little over 60,000 veterans in this data set and how we trimmed it down to 17 variables? So we feel pretty good after looking at the log file. Let's go back to our code. To filter in veterans, we used a data step with an if statement and kept the records if they met a certain criterion. There is another way of doing this using the delete command. Here I'm going to create a temporary data set in the work directory called Delete_Example just to show you what I mean. In this example, we put it this way, if VETERAN3 not equal to one then delete. The ne in the syntax stands for not equal. I've created a handout that list the most commonly used operators in SAS and it's in the exercise files for this movie. Check it out. So what this code says is that SAS should read in r.BRFSS_b then output the data set Delete_Example deleting all the records where VETERAN3 does not equal one. Let's highlight and run this code then switch over to the log file to see what happened. See the messages in blue, especially the numbers? You can convince yourself then that the two different ways of formulating the code filtered in the same number of records. Let's go back to our code. So now we've applied our first exclusion to BRFSS_c which now only contains veterans. Next we are going to only want to keep records with a valid value for the exposure variable DIABETE3. So in our data step, we read in r.BRFSS_c in the set statement and read out r.BRFSS_d in the data statement. And here we have the if, if DIABETE3 n and then we put a parenthesis and list all the values that are valid separated by commas. Remember one is diabetes, two is diabetes during pregnancy, and three and four are different versions of no. We have to keep the yeses and the nos because we need both the exposed and unexposed in our data. We need to remove everyone who didn't tell us if they were exposed or unexposed. Let's highlight and run that code then look at the log file. The message says that BRFSS_C has 62,120 records and BRFSS_D has 62,014 records, meaning 106 were removed if I did my math right. That seems consistent with the proc freq we did on DIABETE3 earlier. Let's go back to our code. The last categorical exclusion we will apply now is for the categorical outcome ASTHMA3. You will see I read in BRFSS_d, read out BRFSS_e and I only keep the records in BRFSS_e that have the value of either one meaning has asthma or two meaning no asthma in the data set. In other words, if the respondent didn't know if they had asthma or not, the record would be removed in this step. Let's highlight this code and rock and roll then we'll check the log file. BRFSS_E now has been reduced to 61,833 records. We are moving right along. Meet me in the next movie where we apply exclusions to our continuous variable SLEPTIM1.

Contents