From the course: Descriptive Healthcare Analytics in R

Reading in BRFSS XPT data - R Tutorial

From the course: Descriptive Healthcare Analytics in R

Start my 1-month free trial

Reading in BRFSS XPT data

- [Narrator] And now on to Reading in the BRFSS XPT Data which we will do in this section. So here we will download and read in the data set. Okay, so let's head over to the BRFSS website and see if we can download our 2014 data. Okay, so here we are at the BRFSS 2014 Suvey Data and Documentation page. Information is at the top, the data files are in the middle, and there are some resources at the end. Let's scroll up to the Data Files. You will see that they are posted in two formats. ASCII format, which can be read by any program, and SAS Transport format, which is dot XPT format. XPT is not the regular SAS data set format. That is called SAS7BDAT. XPT stands for transport. Why XPT? Well, it's not a secret that SAS7BDAT data sets are really bloated. So SAS invented this XPT form. It's kind of like a zip version of the data set so it is smaller when posted on a website for download. And you know how sneaky R is? The foreign package we just installed allows you to read in XPT files. Can you believe it? So this is the one we will download, the XPT. Even though it's an XPT, it's a huge file. So let's get started. While that's downloading, lets turn to our file manager so we can set up our directories. I suggest that, for your project, whether you are using SAS or R or any statistical program, you make a folder called analysis or analytics, and put only files directly related to your data analysis there. In other words, data dictionaries, other notes you take, etc. Those metadata do not belong in the analysis folder. The only things that belong in this folder would be the actual data you are using or that you output as part of the R analysis and the actual code you make. So, within this analytics folder, I have two folders, data and code. You want to keep these separate and, in case you didn't guess, put the data in the data folder and put the code in the code folder. So that's it an analysis folder with a data folder and a code folder in it. And guess where we are going to put that XPT file we are downloading? You guessed it, in the data folder. Let's copy it in. Wow, this is such a big file even the XPT file is zipped. So lets unzip this from the downloads directory and then drag our XPT data set in to our data folder we just made. Done! Okay, let's go back to R. Now, remember how I showed you change directory? That's how to set a default directory for the session. Let's get R to be automatically mapped to the data folder for the session. We do that by clicking on the console, and choosing File, Change dir. Now we'll choose the analytics folder and then the data folder and say OK. That way, when we read in the data, R will, by default, look in our data folder. Here we are back at our code. At the top we call up the foreign library, but below we will create an R data set out of the downloaded data and we have to name it. I name data sets simple, short names and then I add a suffix as a letter. So I start with "a". You will see I increment this letter each time we remake the data set. Which we will do about a zillion times in this chapter. So I start out with BRFSS underscore a as a name and I put that here. Now, because we are making an object, I use the less than sign followed by the dash to make a little arrow. This command, the less than and then the dash to make a little arrow, is the universal command in R for making an object. In this case, the object we are making is called a data frame. R has other objects we'll encounter, but right now we just want to get the BRFSS data into a data frame called BRFSS underscore a. So what do we put in BRFSS underscore a now that we have our little arrow? That's what goes on the other side of the arrow. The foreign package doco we looked up tells us we want to use the function "read.xport". This is the function and then we have the parenthesis and that's where I put the actual path and name of the data set. Complete with the dot xpt at the end in quotes. Note the direction of the slashes, forward. That's all that we'll read in our data set. Lets run it. Lets run it by highlighting it and doing Control R. It will take a little while because it's almost 500,00 rows, but it won't take long. Errors would be reported in the console, but look, no errors. Yay! Okay, now that it is in there, it would be nice to see a list of all the variables, wouldn't it? You can do that with a "colnames" function. So lets look at the colnames of BRFSS underscore a. Let's highlight this and do Control R. Here we are. Okay, I want to call your attention to something. Remember from our data dictionary how some of the native variables we are using, that the CDC already calculated for us, started with an underscore? For example, the underscore BMI5CAT variable had that. But now, as you can see in the colnames output, R has added an "X" to the beginning of the variable name. Now it is named X underscore BMI5CAT. We have to update our data dictionary to add this nuance. I'll do that offstage. I'll show you how I update the data dictionary by using the BMI example. First, in the main dictionary, we need to find our BMI5CAT variable and add the "X" before the underscore. Next, because we called the picklist for this variable Obesity, we need to go to that tab and update the variable name that's listed there. And, for the other tabs that we actually named after the variable, like HISPANC, remember HISPANC, we'll have to add the "X" to the beginning of the name of the tab. So congratualtions, you successfully read in your data set. So, in this movie, we downloaded the xpt file from the BRFSS site. We set up our directories and put our data in one of them. And we read in our xpt file. We also took a quick look at the column names. In the next movie, I will talk about naming conventions for both our data sets and our code. So you know what to name what you just wrote.

Contents