From the course: SAS® 9.4 Cert Prep: Part 05 Analyzing and Reporting on Data

Demo: Creating summary statistics reports

- [Instructor] The means procedure is great for calculating basic summary statistics and looking for numeric values that might be outside of an expected range. But now we're beyond validation and we can use proc means to generate complex reports that include various statistics and groupings within the data. You've used the var statement in proc means to specify the numeric columns to analyze in the proc means statement you can specify the statistics that you want to calculate and how they should be displayed. The class statement enables you to name one or more columns to group the data and then statistics are calculated for each unique value of the class columns. When you have more than one class column you can use the ways statement to control the combination of values of the class columns. Let's see these new options and statements in a demo. Let's look at the default proc means report and additional options that we can use to customize the output. So to start with we'll use proc means, data equal, pg1.storm final and the variable that we'll analyze is MaxWindMPH. I'll run this program and we have statistics produced for the entire table. There were 3,071 known values for MaxWindMPH the report includes the mean, standard deviation, minimum, and maximum. So what other options do we have? Well first of all on the proc means statement we can request different statistics or change the order of the statistics in the report. I'll add options to compute the mean, the median, min, max, and I'll also use the option maxdec equal. This allows me to specify how many decimal places I want displayed for each of the statistics. I like everything rounded to the nearest whole number. Let's take a look at what that produces, notice we have these four statistics calculated for the entire table. But what if we would like to separate the statistics and calculate values within groups? The class statement in proc means allows us to do that, I'll use the class statement and provide the column BasinName. The class statement in a way groups the data. So we will calculate the statistics for each unique value of BasinName, however one nice advantage this class statement has over the by statement that we had previously used for grouping data in proc print the class statement doesn't require we sort the data ahead of time. Let's see what it looks like. Notice we have one row per BasinName and the same statistics reported. By default we also include the N Obs column that tells us how many observations there were in the data for each of those values of BasinName. Back in the program I'm not limited to one column on the class statement, I could group by multiple columns what if I add StormType, if I run this program, you'll notice there's a single table created for the combination of BasinName and StormType with the corresponding statistics. But what if I'd like to look at different combinations of these class variables? In the proc means syntax there's an additional statement that we can use that's called the ways statement. This allows us to control how the values of the classification variables are used to segment the data. For example ways, one semicolon. Let's see what that produces. Notice that it creates two separate tables, each table uses the values from one of the class columns so the first table represents the values coming from the value StormType, the second table segments the data based on the values of BasinName. Let's look at one other example with the ways statement. I could add additional values such as zero, I'll leave one, and I'll add two. What would this produce? So when ways is zero, we use zero classification variables to segment the data. Or in other words, statistics are calculated for the entire table, that's what the first table represents when ways is one, we have the two separate tables one for StormType and one for BasinName and finally when ways equals two, we use the combination of the two columns, BasinName and StormType to calculate the summary statistics.

Contents