Join Barton Poulson for an in-depth discussion in this video Creating bar charts for categorical variables, part of R Statistics Essential Training.
Once you've got your data into your program, the best way to start looking at it is by literally looking at it. We're visual animals. We like to see what's there. You can get patterns, you can get differences between groups, by looking in ways that even the most sophisticated numerical analyses may not be able to capture. The very first chart that we are going to look at is the most basic of all. It's the bar chart for a single categorical variable. Let's start by saying if you want help on plots in general R has excellent built-in help. You just do the question mark plot to get help on plots and that brings up the Generic X-Y Plotting and there is a lot of information you can get in there.
In this example, we're going to use one of the built-in data sets and although it is usually installed and loaded by default, but just in case it's not there, we're going to use this line, require data sets. I'll run that one, and then after that, we're going to open up a data set called chick weights. Now, to get information about this data set, you do the question mark chick weights, and it's about 71 observations on the weight on several different kinds of feed. If you want to see the actual data, you just type the name of the data set. Because it's one of the built-in data sets.
Here we go, I'll make this bigger. And, then we have weights, and we've got several different kinds of feed listed. Okay. Let's load the data into the work space. There we go. The easiest way to make this plot is with the generic XY plot function plot. And, I tell it that I'm going to use the data set chick weights, and then the dollar sign means I'm going to use the variable. Feed from that data set. And, if I do that I run line 17 here. I get my bar plot. It's here on the right. Now, you only see every other label because it's small. If we zoom in on it, now you can see all of them.
And, you know, it's not a bad chart, it's very basic. But a bar chart is a basic thing, it's pretty clear that we have more soybean than we have horse bean. But, what I want to show you is ways of modifying this. That's one of the beauties here. You can change the graphs to make them work. The way that you think would be most informative to our audience. So what we're going to do, you can get the information on plotting again here, we're going to look at another function. It's called bar plot. It offers more control, but you have to prepare the data. It can't take the data the way it exists right now, in the chick weights, because that is one rope or observation.
See right here you got case number 71, 332 grams, I imagine. And then what it ate. But you can't use that kind of data for the bar plot function. In this respect, it's a lot like Excel. Excel cannot produce bar charts from these raw data. You have to have summary tables for it to work. And so we are going to create a summary table and then R is going to be able to make the bar plot from that table. So what I'm going to do is I'm going to create a new object. I'm going to create feeds. It's a table where I'm going to take the data set chick weights, and I'm going to take the variable feed from it.
So that's chick weights, $feed. And then I'm going to use the table function and then put that into an object called feeds. So I'm going to run line 24 here. And you see that feeds is now showing up in my work space on the top right. The next thing I'm going to do, I'm going to take just a quick look at that feeds object. And here we have it, we've got our six different feeds and how many chicks received each of them so we see that 14 received soy beans, 10 received horse beans and so on. Now what I can do is I can make a bar plot of that. Now by just doing bar plots and feeds I'm going to get a plot that looks exactly the same as the current one.
I'm going to run line 26. And you can tell that it's a new plot because I can back up and forth here. Although, they look exactly the same. But the nice thing is, is I can modify a bar plot. In fact, if you look at the help for bar plot, you see there's a lot of things you can control here. The first thing I'm going to do is I'm going to put the bars in order. I'm going to put them in descending order. Usually, when you have a bar plot, you want to have the highest bars to the left. And descending values to the right, unless there's some sort of inherent order in your variable. Or, if you're doing it horizontally, when the highest bar is at the top and coming down, getting smaller.
That usually works best. So, we're going to do that by using bar plot and then feeds. Then I've got this really long statement here that specifies the order that I want to use. Now, the order statement has to be in square brackets, and I'm telling you that I'm going to order it, that's the first thing. Then I'm telling how I'm ordering it. And I'm going to order it by feed. Now that may seem redundant, because I have feeds right here and I have feeds right here. But the beauty of this is, you can order a variable by itself, which makes perfect sense, but the flexibility it gives is that you can order one variable by another variable.
Which is a really great way of looking at the relationships between your variables, and can really increase the power of your analyses. But we're going to be ordering feeds by itself, and then I have to tell it that we're going to use decreasing is equal to TRUE. Now please note that TRUE is written in all capital letters. It has to be. TRUE is sort of a system value. You can use a capital T for it, but T is actually a variable that is different from this. It refers to the word true, and because t is a variable, you could write over it, and replace it with something else. It could create some confusion. It's usually best to write out the word TRUE, or the word FALSE, in all capital letters.
Anyhow, this is what I'm going to do right now. I'm going to make a bar plot where I change the order. So I'm going to run line 30, and there we go. And if I zoom in, then you can see all these labels on the bottom. This is better. It's in order, and so it's already less chaotic. That's a good thing. But what I'm going to show you, is that there's a lot you can do to customize the chart. Now, the first thing I'm going to use is the par function, that stands for parameter. And parameters control a huge number of things, and so much of the flexibility of R's graphics comes from the par function.
In this particular one, I'm going to be manipulating the outside margins, that's OMA, outside margins, and I'm going to be manipulating the plot margins. That's MAR for merges themselves. And what I'm doing is I'm specifying in lines how big each of the margins should be. I give four numbers cause it gives the bottom, left, top, and right in each case. And you can get at these by a just a little trial and error. It took me three or four times to get to these values. I'm going to highlight both of these at once. And then, by doing Cmd+Enter, I'm going to run them both at once.
And, those are now going to affect every plot I make, unless I set them back to the defaults. At least every plot I make now. Okay, now what I have, is a whole lotta code. These are the two Par statements that I just barely ran. I want you to see the rest here. I'm using bar plot and I have a lot of things going on. I'm going to chart feeds, I'm going to order it by feeds. By the way, I don't have to put the decreasing equals true because I'm going to be doing it horizontally. And I actually want it to be increasing which is the default. I'm going to draw it horizontally.
I actually prefer horizontal bars because they put the numerical axis in the same direction they would be on, say for instance, a histogram or box plot or something like that. And then I have this other argument, las, that is the orientation of the axis labels. One means always horizontal. I'm going to be changing the colors with call. And I'm using several different colors. One color for each of the bars now. I'm using the color names and I got them by looking at the color name chart. And I could use the color brewer or a built in palette, but I thought this was a fun way to do it.
But they're all in a single statement here. And there I've got my call. I'm going to go to the next one. I'm turning off the borders around the bars. I'm going to put a label on the top, a title. And this little character right here is a UNIX code that means a line break. So backslash and an N. It doesn't print; it just splits into two lines. And then finally xlab is the x label. And it's going to be the number of chicks because I'm going to have the numbers across the bottom. So, what I'm going to do now is I'm going to highlight that entire block of code, and then I'm going to press Cmd+Return to run that entire block at once.
And now you see I have a very different looking chart. Let's zoom in on that one. It's in, basically, shades of brown, and going from very dark to a very light one at the bottom. We have the number of chicks listed across the bottom. We've got a title that's split across two lines I think it's a very easy to read chart and I think it's reasonably attractive. So that's an improvement. Again, if I back up that's what it looked like before, that's what it looks like now. That's one of the really nice things about this. So, if you want more information about the parameters you can go to? par. And there they are.
And that finishes with the basic chart for a categorical variable, a bar chart. We're going to take a look at some variations. We're going to look at a pie chart in the next one, although I'll tell you they actually don't really want to do that. And then we'll look at how to create a series of other charts for different kinds of data. But right now, this should serve as an excellent introduction into what you can do with R, and the possibilities you have.
- Installing R on your computer
- Using the built-in datasets
- Importing data
- Creating bar and pie charts for categorical variables
- Creating histograms and box plots for quantitative variables
- Calculating frequencies and descriptives
- Transforming variables
- Coding missing data
- Analyzing by subgroups
- Creating charts for associations
- Calculating correlations
- Creating charts and statistics for three or more variables
- Creating crosstabs for categorical variables
Skill Level Intermediate
Q: The R files within Chapters 01 to 10 don't appear to have any code in them. Where is the final code for each file?
A: Look in the "final" folder for each video. These folders contains the final R code written by the author.