Join Barton Poulson for an in-depth discussion in this video Creating histograms for quantitative variables, part of Learning R.
In the last movie, I started by saying how important it was to screen the variables as you enter them by making charts as a way of checking that you entered them correctly, that you are meeting the assumptions of the statistical procedures that you intend to use, and a way of giving you an idea of what's interesting or unusual in your data set. We looked at bar charts, which are good for categorical variables. When you have a quantitative variable, something that's an interval or ratio levels that has been measured, like age, or time, or income, then you want to use a different approach.
The two most common forms of graphics you want to use in that case are histograms, like bell curves, and box plots. In this particular movie, we're going to look at histograms. Now, the nice thing about histograms is that, unlike box plots, R has a built-in function for this one that does not require you to do any sort of pre-parsing of the data. I'm going to use an example here of the social network data that I've used before. I'm just going to scroll down here, and read in that data set. You can see on the workspace I've got a data frame, that's sn, for social network.
It's got 202 observations with the 5 variables. And then I just come right down here, and I'm going to make a histogram of the variable of age, so I'm going to look at distribution of the age of respondent, so I use hist, that's the function, and within the parentheses, I specify the data frame, that's sn, and then the dollar sign, and then I give the variable name. Now, I should mention, it is possible to use something in R; a function called attach, which means you attach a data set, and then you can refer to it in a short-handed way.
You can just give the variable names, because it knows you're referring to that particular data set. The problem with attach is it really sets the stage for a lot of really unfortunate errors, where you have more than one data set open, and that you get confused about what's doing what. And so, for instance, when I talked about the Google Style Manual for R, they just said don't use attach ever. So, what I'm doing here is I'm explicitly saying what the data frame is, and what the variable is. Anyhow, I'm going to make a histogram of age, and all I have to do is run that one line on line 15.
There we have the default histogram. You see, for instance, it says histogram, and then it gives my funny title there on the top, and runs it again at the bottom. And this is sort of an outline version of what we have. I'm going to make just a few modifications to this; not very many. I'm going to come down here, and what I'm going to do is I tried once removing in the borders. You can do that, but it looks silly, so I left that out. I'm going to change the color to a beige color; actually, a very light color. It shows that light beiges and yellows are good at getting people's attentions without being overwhelming.
You can specify colors in a few different ways. This one is a named color, so I put col, for color, and then in quotes I put the word beige. That's referring to a specific one. There's another way to refer to it, and that is colors in R also have numbers from 1 to 657, I believe, and the beige is number 18. The way that you would specify it in that case is with this line. I would put col, for colors, and I say referring to the colors, the set of colors, and then in the square brackets, I just say index number 18.
That would get the same color, but I'm going to make it beige, and then I'm going to put a label on the top of title. That's main, that's for the main title, and it's a long one, so I'm just going to scroll to the end here for a moment. And the backslash n breaks it into two lines. I'm going to back to the beginning, and then I'm going to have an X label at the bottom that I'm going to put underneath the age, where it's just going to say age of respondents. So, what I do now is I highlight these lines, and I run those. Now you'll see I have a little bit of a fill, a bit, just to make it pop out a tiny bit.
I have an interpretable title at the bottom. I've got a label under the age that makes sense, and that's really enough for what I need to do. That a functional, useful histogram, and again, like box plots, there's about a million options that you can have in terms of modifying a histogram in particular, and the graphics parameters in general. You can explore those, but this is sufficient for getting started. By the way, I just wanted to add something about R's color palette. If you want to, you can actually see the palette by going to this Web address. I'm going to copy that, and I'm going to go to a Web browser, and we get a large chart.
This is just the beginning of it that talks about what all the colors are. If you click on the PDF, it's several pages long. It gives the numbers for colors, and then sorts them, and then gives the individual names for each one of them. For instance, there's the beige that I used just a moment ago. You can also get the hex codes, and the RGB codes if you want for that. I'm going to go back to R now, and just show one other thing. By writing colors, that refers to the array; an 18. If I run that line, see what it does down here is it says that colors, number 18 is beige, and then I can to also specify several by putting them in a concatenated array.
When I do that, I run that, and it tells me the colors of each one of those numbers that I put in. Anyhow, those are some of the options that you can use in customizing your histograms as a way of exploring the quantitative data, and getting you ready for further analyses. In the next movie, we're going to look at another chart that is very useful for quantitative variable, and that's the box plot.
The course continues with examples on how to create charts and plots, check statistical assumptions and the reliability of your data, look for data outliers, and use other data analysis tools. Finally, learn how to get charts and tables out of R and share your results with presentations and web pages.
- What is R?
- Installing R
- Creating bar character for categorical variables
- Building histograms
- Calculating frequencies and descriptives
- Computing new variables
- Creating scatterplots
- Comparing means