Join Barton Poulson for an in-depth discussion in this video Calculating frequencies, part of R Statistics Essential Training.
Once you've done the graphical analysis of your data, you want to get to the numerical descriptions. The simplest way to know this is to calculate frequencies for a categorical variable, that's what we are going to look at in this movie. We're going to start with a small data set that I got by looking at number of hits for color words in Google. Now these are just words I got off the top of my head. And all I did by the way to get this data was I went to Google and I typed in the words. So for instance here's orange, and I took this number here, the number of results I got as my data. And you can see obviously.
It brings up a lot of things that are not colors. We have orange count in. We have the cell phone company. But, I took that information and I put it here, and we're going to be creating a dataset. Now, what's happening here is I'm using the concatenator combine function. And, then the repeat means I'm actually repeating the word blue. 3,990 times, and by the way, this is counting millions of hits, so that's nearly 4 billion hits. And we're going to end up with a rather long dataset with just these five color names in it. I'm going to highlight that and run that block, so now I've got 14,000 lines of data.
Now I'm going to create a frequency table. Now, in one sense, I have a frequency table right here, but it's not in the right format, and I can't do analyses with it. What I'm going to do is I'm going to just use the tables function, so table and groups. I'm going to feed it into an object so I can call on it later. So I'm going to run line 14, and now you can see I've got it in the workspace up on the top right. I'm going to print the table, so I'm just going to show it in the console here. And there it is. Now one of the interesting things about this, is it's doing it alphabetically.
It's not the same order that my stuff showed in. I entered it as blue, red, orange. Here's coming blue, green, orange. It's changed it to alphabetical. There may be times when you want that, but I find that normally it works best to do things in descending frequencies. And so to do that, I'm going to modify the table, and I'm going to sort it and create a new table. All I do is I use the sort function, I tell it what I'm sorting, and then decreasing equals true. And I'm saving it as a new object. You don't have to save as a new object, but that's what I'm doing right here.
I'm going to run that line. You can see that groups.t2 for table two has shown up on the right, and we can print that one to the console. And now, let me just make this bigger. You can see I have the same information, but now it's sorted in descending frequency, so red now comes first and purple comes last in that group. On the other hand, maybe you want to present in something other than frequencies. Proportions can often be more helpful so now I'm going to do is I'm going to use the prop.table function, that's proportion table. And then just tell it that I'm using the groups in t2 because that preserves the order that I have there.
Now there's proportions that go from zero to one, and they're similar to percentages, although these have too many decimal places. So what I'm going to do there, is I'm going to wrap that command with round, so round goes to the decimal places. And right here, I just specify how many decimal places. So I'm going to run that line 23. And now, we're down to a more readable format. But, as I think I've shown before, it can be even nicer to just multiply those numbers times 100, and get rid of the zeroes and the decimal places. It doesn't put the percentage sign on, but it's very similar to reading them as percentages.
And I think that's the easiest one to read. Now, of the five colors, if you add up all the hits for these five color names, then, 28% were red, 27% were blue, 6% were purple, so. Anyhow, that's all there is to creating frequency tables for frequencies, or proportions, or percentages, in R. Very simple, and very clean.
- Installing R on your computer
- Using the built-in datasets
- Importing data
- Creating bar and pie charts for categorical variables
- Creating histograms and box plots for quantitative variables
- Calculating frequencies and descriptives
- Transforming variables
- Coding missing data
- Analyzing by subgroups
- Creating charts for associations
- Calculating correlations
- Creating charts and statistics for three or more variables
- Creating crosstabs for categorical variables