Join Barton Poulson for an in-depth discussion in this video Getting started with the R environment, part of Learning R.
Let's start by taking a look at R when it first opens. For this course, we'll be using the RStudio interface, so I'll begin by double-clicking that icon on the Desktop. If you want to use the default R application, just double-click on the appropriate icon. R is the 32-bit version for older computers. R64 is the 64-bit version, which most people will want, and that will become the default in R3.0. Either way, once they're open, they appear identical. Also, if you prefer to work in other environments, you have other choices.
So, for instance, on a Macintosh, you can open up the terminal, and access R that way by simply typing the letter R at the command prompt. Similarly, in Linux, type R at the command, or you can set it up to use the text editor of your choice through the preferences or options. When you first open R, what you get is the console. That's what I have here on the left, and it comes up with a bunch of boilerplate texts. It tells me, for instance, the version that I'm using, it gives information about the license, it gives information about contributors, and citation, and also how to get some demos or help, and how to quit R in the console.
In RStudio, it's easy to resize the windows by simply dragging the dividing line. Right here, I can make it smaller, or larger, and while the console is where the action happens in R, it's not the place where you want to be working. Instead, you want to be working in a script environment, because you can save that. Also, I want to clear the console first. On Mac and PC in RStudio, that's just Ctrl+L, or you can go up to Edit, and down to Clear Console. I'm going to use the Ctrl+L. It just clears out all the text, and then I'm going to open up a script.
Now, you can either open a new one by coming up to File > New > Script, or you can click on this Menu option right here to create a new script. I've already written a script for this movie, so I'm going to open that by going up to this icon right here to open an existing file. I'm going to come down to where I have it; I'm in the Desktop; Exercise Files. This is chapter 2, movie 3, and there's the file.
I'm going to double-click on that, and it opens up in RStudio. Now, I want to point out that there's a lot of code in this one, but almost all of it is comments. Anything that begins with the hash tag, or the number sign, and shows up in a light green here is a comment and it's not run. The actual coding is in the blue and the grey, you'll see. I can run each line here, and it will show up in the console one at a time. So, for instance, what I'm going to do is I'm going to come down to line 4 where I simply have 2 + 2 written, and as long as I'm anywhere in that line, on the PC I can hit Ctrl+Return, on the Mac I hit Command+Return, and it will run that line.
So, now what you see in the console on the bottom is all in blue, 2 + 2, that's the command that I wrote, and then it included the comment after the hash tag, and then beneath that, it gives the output; the result of this one. Now, you can tell the command, because it appears after the command prompt, that's the greater than sign, and the response appears after this index number. So, the one in the square bracket is the index number for a vector. The idea is that sometimes it puts out a whole lot of numbers, and it gives you the index number for the first number in that line.
In fact, I'll show you what it's like if there's more than one line. I'm going to come down to line number 6 in the script on the top where it says 1:100, and what that's going to do is it's going to print the numbers 1 to 100 across several lines. The cursor is there, so I can just hit Ctrl+Enter on the PC, or Command+Enter on the Mac, and now you see we have the index numbers. The first line begins with index number 1, the second line begins with index number 17, and so on. So, when you get your output, and you get these little cryptic numbers in the square brackets, that's just giving you the index number for the vector that it's dealing with.
Also, you may have noticed that there's no command terminator on these. For instance, I don't have to put a semicolon or any other mark at the end of the command. It simply does it one line at a time. If I have a command that's going to go more than one line, it's in parentheses, and I'll have examples of that later in this course. A customary thing, also, whenever you're learning a new language, like learning the R programming language is to learn how to write "Hello World!" This one, because it's text, I just put print, and then in parentheses, I put the text that I want in quotes.
In this case, it's "Hello World!" So, I press Ctrl+Return on the PC, Command+Return on the Mac, and now I have my "Hello World!" I'm going to scroll down a little bit in this window. Because R is a programming language that was intended for working with data, it also works very well with variables. In line 11, I'm going to create a variable called x, and I'm going to put into it the numbers 1 through 5. Please note I have an assignment operator here; that is the <-, the arrow, and that's often read as gets, and so I would read this as x gets the numbers 1 to 5.
I'm going to bring the cursor down there, and hit Ctrl+Return on my PC, Command+Return on the Mac, and you see now that I have x gets 1 to 5, and then it tells me that it's run that command, but also look off to right side, the top right; you see there in the workspace, it's telling me that I have now created a variable called x. It's an integer with five numbers in it. If I actually want to see the numbers that are in x, all I have to do is enter the name of the variable, just x, and then I've got this hashtag comment after it that says display the values in x. So, I'm going to hit Ctrl+Return to run this line, or Command+Return on the Mac.
Now you see that I have five numbers: 1, 2, 3, 4, 5, and then the index number for the first one in the vector is 1, which is why that appears at the beginning of the line. Also, if I want to have a set of numbers that's not just sequential, but actual data, I have the option of using a function called concatenate. That's the C here. This is in line 13. I'm going to create a variable here called y, and I'm going to specify the values that I want in it. This time it's 6, 7, 8, 9, 10, and I put them in parentheses with the function c.
Again, that stands for concatenate, or sometimes called combine, or collection, because it puts them all together into this one variable. I have the cursor in line 13. I'm going to press Ctrl+Return on the PC, or Command+Return on the Mac, and you see down in the console at the bottom, I now have in blue that that command has run, and if you look into the workspace on the top right, you'll see that I now have not just the variable x, which has five values; I now have a variable y, which is numeric values that also has five values.
If I want to see what's in y, I can go back to the script on the top left here. My cursor is already at line 14, because in RStudio, any time you run a command, it bounces down to the next line, which is convenient. So, I'm going to press Ctrl+Return on the PC, Command+Return on the Mac, and now it shows me that I have these five values; 6, 7, 8, 9, 10, where the index number for the first one in the vector is 1. One of the really neat things about R is that it allows you to do vector-based mathematics, which is a way of working with what normally you'd call an array of data, but it allows you to do operations on them without having to specify for loops, and so the code can be much simpler here.
So, for instance, I have five numbers in my variable x, I have five numbers in my variable y, and if I want to add them to each other, where the first one in each one gets added, the second one in each one gets added, because they have the same number, all I have to do is write x + y. So, here I'm in line 15. I'm just going to press Ctrl+Return on the PC, Command+Return on the Mac, and this time, it not only shows me the command, it automatically outputs the results. That's because I'm not saving it as a new variable. I'm just running it.
So, here at the bottom of the console, you see that I now have 7, 9, 11, 13, and 15, and those are the sums of the items in those two variables. Also, if I want to simply multiply each of the elements in x, I can do that by writing x * 2, and it will do each element, and it will output it that way. The cursor is already in line 16 in the script on the top left. I'm going to hit Ctrl+Return to run that line on the PC, Command+Return on the Mac, and you see down in the bottom console, it shows that it's run that particular command, x * 2, and it's got the output here.
It's five numbers; the index number of the first number is 1, and it goes 2, 4, 6, 8, 10. I just want to mention a couple of things about style and putting things together. I showed you that the assignment operator when you want to put values into a variable is this arrowhead, and so you say y gets the concatenation of 6, 7, 8, 9, 10 in line 13. It is possible to do this with an equals sign. R will run it, but that's considered poor style. In fact, there are several style manuals that have been written for coding in R. One of the more interesting ones is written by Google, which is nice because it's publicly available.
It's short and it's very clear. I'm going to go to my browser and show you that one. We have Google's R Style Guide, which talks about ways to name files, it talks about indentation, and the brackets, about assignment, and I suggest that as you begin to write your own code in R, you take a few minutes and go through this, so you can write code that is more readable by others, and will make better sense for you, and run more smoothly in R. I'm going to go back to R now, and I'm going to come down to the bottom here, and clear the console. I don't need that information anymore.
I'm going to hit Ctrl+L to clear it. Now, R is conceptually simple, and because it's command line based, you don't need a lot of menus. It can be very helpful to keep a few windows open simultaneously, such as we get to do here in RStudio, where we have the editor window, we have a Console window, we also have an indication of the variables that are active in the workspace, and we have access to information on packages, and help in the bottom right. R is a conceptually simple language, and it's conceptually simple program. Because it's command line based, it's easy to save the information here in the editor, and share it with others.
I encourage you to take a little bit of time to look at the style manual, to find ways that you can write your own code to make it easiest for you to understand, and easiest to share with others.
The course continues with examples on how to create charts and plots, check statistical assumptions and the reliability of your data, look for data outliers, and use other data analysis tools. Finally, learn how to get charts and tables out of R and share your results with presentations and web pages.
- What is R?
- Installing R
- Creating bar character for categorical variables
- Building histograms
- Calculating frequencies and descriptives
- Computing new variables
- Creating scatterplots
- Comparing means