Join Barton Poulson for an in-depth discussion in this video Calculating frequencies, part of Learning R.
When you're exploring your data to make sure you meet your assumptions, or to find interesting exceptions, graphics are an excellent first step. However, most analyses also require the precision of numbers in addition to the heuristic value of graphics. Just as we started with graphics for categorical variables, we'll also start with statistics for categorical variables. The most common statistics in this case are frequencies, which is what we'll do first. I am going to use the data set that I have been using so far; social network. I'm going to come down here, and because I have it saved in my default location, which I set to be the Desktop, I can simply run this line to read the CSV file.
I see in the console that that command ran fine. In the top right in the Workspace, I see that I've now loaded the date set sn, for social network; it's got 202 observations in 5 variables. The next thing is to create the default table, and this is a frequency table. It does it in alphabetical order, and it looks like this when I run it. What we have is 93 people who indicated that Facebook was their preferred social networking site, 3 who did LinkedIn, 22 to MySpace, and so on. Now, this is adequate for getting the numbers.
On the other hand, it would be nice to be able to modify it in a particular way. This is going to be easiest if I save the table as its own data frame. That's what I'm going to do in line 15. So, I'm going to create a new data frame called site.freq, or frequencies of the sites, and I'm going to use it making the same command here. So, I'm just going to run it again. Now you can see that I've created this new data set, and in fact, that shows up in the Workspace. It is a table which has six values in it. Now I'm going to print the table just by writing its name; just site.freq will print the table, and there it is.
It looks exactly the same as what I had before. Now what I'm going to do is I'm going to start modifying it just a little bit. The first thing is I'm going to sort it. Sorting is kind of a funny thing when it comes to tables. I'm going to sort it into itself. I'm replacing this table with a sorted version. In line 18, you see that I have site.freq. That's the name of the table. Then I have the assignment operator, the arrow dash that's read as gets. Then I say it gets site.freq, but then in square brackets, I put down that I'm going to order it, and then in parentheses, I put down the basis for the ordering.
In this case, I'm ordering it by the only thing in there, site.freq. The idea here is that you could order it by another variable. In this case, I'm also specifying that I want to do it in a decreasing format. That's why the decreasing equals T for true. I'm going to run that command, and we see that that run in the console. The command is there. Now I'm going to print the table over again by just doing site.freq. Now you see that it's sorted in order. It started at Facebook again, then None. It goes 93, to 70, to 22, to 11, and so on.
These are the counts, the frequencies; how often each one occurs. On the other hand, sometimes it's helpful to have the proportions of the percentages, and that's a very simple thing to do with R's built-in table function. I'm going to use the prop.table function. That's proportions.table. I'm going to say what I need the proportions of, and that's site.freq, which I saved as a table, so it would work on this one. I'm just going to run that command, and now you see that I have the same labels -- Facebook, None, MySpace -- in order, and I have proportions under them.
Proportions go from 0 to 1, where 0 is 0%, and 1 is 100%. Now, the one problem with this list is that I've got way too many decimal places. If I want to get it down to just two decimal places, I've got just one more command I'm going to run here. I'm going to take the command I just ran in line 21, and I'm going to wrap it with around, and that tells me that I want to round it, and then at the very end of that, you see that I have comma, 2; that means two decimal places. So, I'm going to run that command. That's basically how I want it to look.
Now what I have is proportions. So, it says that 46% of the respondents indicated that Facebook was their preferred social networking site. In this particular date set, 1% chose LinkedIn or Twitter. Depending on your proposes, you may want to report the proportions, or you may want to report the counts, or frequencies up here. Usually, actually, you would want to do both. The nice thing is that the table command in R makes it simple to do both of those.
The course continues with examples on how to create charts and plots, check statistical assumptions and the reliability of your data, look for data outliers, and use other data analysis tools. Finally, learn how to get charts and tables out of R and share your results with presentations and web pages.
- What is R?
- Installing R
- Creating bar character for categorical variables
- Building histograms
- Calculating frequencies and descriptives
- Computing new variables
- Creating scatterplots
- Comparing means