Learn how to calculated stratum-specific summary statistics for sleep duration, and add these results to the continuous Table 1.
- [Instructor] We are almost done with the basic descriptive analysis. We are now at chapter five, section seven, where we will be completing continuous Table 1. For this one, we'll use the plyr package. And we'll use the command ddply, for means and standard deviations. Like before, we'll define a macro to output the CSVs, and then we'll copy/paste the results into our Table 1. Let's go to our continuous Table 1. This Table 1 is considerably simpler, compared to categorical Table 1.
Our goal is to just get the mean and SD of sleep duration for each category. We also need to report the overall mean and SD at the top. So let's go to R and do some fancy coding. So we just finished with 220. So the code we are using now is 225 Table 1 means and SDs. Remember at the top of our Table 1, we need the overall mean and SD. So I wrote this code here.
As usual, we'll skip the read command because we already have analytic and memory. See the mean command? And then the variable in parentheses? And also see the SD command with the variable. Let's run these. Highlight Control+R. See the numbers in the console? We could copy these numbers into the yellow part of the Table 1. I'll do that. I'm going to copy this mean, which is 7.11 something. And the standard deviation or SD, which is 1.4 something, into our continuous Table 1.
Let's go look at what I did. Here they are. But then, what do we do for the means and SDs for each level? Guess what? A package. This time it's plyr. Let's look at our code. The package is plyr. I'm not sure how you pronounce it. I really love plyr. I've used so many commands from it. So we installed plyr and now I'll call it up in the library. I'll do highlight and Control+R. So ddply is the cool function we will use from plyr to get our work done on this continuous table.
So first of course, we want to see sleep means by alcohol group. Because that is our hypothesis. So we put in here our data set first, which is analytic. And then we put the tilde, and the grouping variable we want, which is ALCGRP. And then summarise with the British English spelling. But you can use the U.S. spelling too, with a Z. Then summarize, and by doing mean=mean(SLEEPTIM2), and sd=sd(SLEEPTIM2), when we look at the output CSV, we can see the headings mean and SD.
You'll see. To start by demonstrating, I am not putting the results into an object. I'm just displaying them on the screen so you can see them. So let's highlight and run this ddply code. See this nice output? But you guessed it, I'm going to make a macro. And I'm going to want to have the macro write out all these CSVs, so we can copy/paste into the yellow part of the Table 1 spreadsheet. So let's load library(gtools), which will let us define a macro.
Great. Now I'm going to make a new macro called SumTbl. And I'm going to look at my code above for guidance. My new macro will only take in the arguments OutputTable for the output table, GroupVar for the grouping variable, and CSVTable for the name of our CSV table that will be written out. Okay now I'm going to load the macro. Let's run that code. Highlight and Control+R. Okay great now let's call the macro. I will call it first using AlcGrp.
And of course, writing this to a table called Alc, so we can find it again. Let's run this line of code with a highlight and Control+R. Okay now let's find the CSV in our data folder. Here it is in our data folder. So let's open it. I'm sure you are guessing what we'll do. You bet. We will copy this out. Let's copy these means and these SDs by highlighting and doing Control+C. Now let's go back to our Table 1 continuous.
Let's locate the upper left corner of where our information goes. Top of the alcohol variables. And put our cursor there. Now we'll do paced values. So as you can see I essentially call up that macro a bunch of times, and output all the CSVs. And then, using the same approach as with the alcohol variable, I copy and paste into the Table 1. Let's highlight and Control+R to run all of this.
Good, this is the end of code 225. I'll show you the completed version of Table 1 continuous. Okay here we go. The whole yellow part is filled in. And the results are reported to the left. Whew, that was a lot of work, making all those tables. So in this section, we focused on developing code around the command ddplyr from the plyr package. Which helped us output means and SDs into a CSV format.
And then, you are probably getting used to this, we copy/pasted from the CSVs into continuous Table 1. As I noted the bivariate test fields are still blank, in both of our categorical and continuous Table 1s, even though both tables are technically complete. In the next chapter, we'll go over how to do these bivariate tests, and complete those sections of Table 1.
This detailed, practical course is designed to help those in the field of public health, medicine, and data science to edit, analyze, and interpret data. Learn how to code new variables, use the forward-stepwise modeling process, and document your decisions. Find out how to visualize results by generating charts and graphics, and how to add tables and figures to your documentation. This course helps equip you to independently design, develop, and execute a full BRFSS analysis, and even publish your results in scientific publications or journals.
- Reviewing survey data and documentation
- Conducting a BRFSS analysis
- Understanding naming conventions
- Editing variables
- Reviewing distributions
- Generating an analytic dataset
- Developing descriptive statistics to answer prespecified hypotheses
- Preparing publication-worthy tables and plots