The five minutes you spend each week will provide you with a building block you can use in the next two hours at work. Review language basics, discover methods to improve existing R code, explore new and interesting features, and learn about useful development tools and libraries that will make your time programming with R that much more productive.
All series code samples can be downloaded at https://github.com/mnr/five-minutes-of-R.
Skill Level Intermediate
- [Instructor] Histograms are tremendously useful ways to visualize data and R provides a ridiculously easy way to produce histograms. Let's take a look at 'em. The easiest way to do it is just to simply type hist parentheses and let's use chickweight and we'll graph a histogram of the weight of chickens so I hit return and I am immediately presented with a histogram of the chick weight and you can see across the bottom is the weight, across the left is frequency, the number of elements at that particular weight and hist has provided us with what it thinks are reasonable breaks for the bars.
There are ways to modify this particular histogram so let's go in and take a look, we can change some of the graphic elements of it. Here's our previous command, hist chickweight dollar sign weight and we can change the density of the bars. Density equals let's type in 30 and that'll produce a line across each of the bars. It just gives us a little bit more visual detail on where those bars are located. Now if we don't like the way that hist has broken our data apart, we can change those breaks and to do that we use the breaks command, B-R-E-A-K-S, equals and then we give it a series of numbers that we would like to actually break at.
You'll need to start at zero. Think of it as the lines in the histogram bars and we're gonna put one at 110, at 200 and another one at the max value of chick weight dollar sign weight and that produces us with three bars and again, remember that the values you've placed in breaks indicate where the lines are going to go. So we have a line that starts at zero, a line at approximately 110, a line at 200 and then a line at the maximum of chick weight dollar sign weight.
Now we can tell histogram to use a formula to calculate the breaks and a simple one is something called five num. Let's take a quick look at that. It's five num and if we give it a range of values, let's use chick weight dollar sign weight what it produces is the minimum, the first quartile, the median, the third quartile and the maximum values for chick weight dollar sign weight. Now I can incorporate that into hist by typing in hist and we're gonna type in chick weight dollar sign weight and I'm gonna set the breaks equal to five num of chick weight dollar sign weight.
And what you'll see now is it looks very much like a normal distribution and you can see that the lines have been placed at 35, at 63, at 103, 164 and 373 which is the values that came back from five num. So again, R has a built in histogram plotting function that's tremendously easy to use and tremendously useful for visualizing data in a very quick fashion.