From the course: SAS Essential Training: 1 Descriptive Analysis for Healthcare Research

Histogram

- [Instructor] So far, we've looked at graphing categorical variables. In this movie, we will use continuous variables to make histograms. See, I opened code 210_Histogram. That's in your exercise files. And, we will be using PROC GCHART again, only this time will we make a histogram. Let's start by making a histogram with a continuous variable we are familiar with, which is average hours of sleep per night, otherwise known as SLEPTIM1. Notice VBAR? Remember that from the bar chart? But in the bar chart, we had a discrete option because we were using categorical data. Not anymore! Now, we are using average hours of sleep per night, so we have to basically tell SAS where to cut up the classes of hours of sleep per night, so the histogram will have equal bars. How PROC GCHART handles this is it has the levels option. By setting the levels to seven, I'm forcing it to plot a histogram with seven bars. Now, you might be wondering how I picked seven, and not six, or eight, or any other number. I admit, I fussed around with the code off stage to get what I thought would be the perfect number of bars in the histogram. We'll see if you agree. Let's highlight and run this code. So here are my seven bars. Can we see a distribution? That's the whole point of a histogram. If you say yes, then we probably are both seeing a distribution that is right skewed. If you say no, you can't see a distribution, you will want to go back to the code and increase the number from seven to something higher. The more bars, the more you are able to see the distribution. But then, we are talking about a variable that doesn't have a very big range. After all, as we saw before, most people in the dataset slept six, seven, eight, or nine hours per night. Let's look at an example using a variable with a larger range. In our analytic dataset, we chose to include age groups. However, there is actually an age variable in the BRFSS dataset called _AGE80. To give you another histogram example, using a variable with a bigger range, and also using a bigger dataset, I thought we could go back to the initial dataset we read in BRFSS_a and visualize the age variable. Here, we have VBAR again and we put our variable _AGE80 after it. Then, I chose 20 levels to show you what happens when you have a variable with a large range in a large dataset and you want to make sure you see the distribution. Let's highlight and run this. See this interesting pattern? Most of the dataset is skewed left it appears, but look at age 80. See this huge bar? If you look in the code book, you will see that for privacy purposes, all respondents with an age greater than 80, were coded as 80. That is why you see this artifact, and that's a good reason to stick with using the age groups, rather than actual ages when doing a big BRFSS analysis.

Contents