Join Mark Niemann-Ross for an in-depth discussion in this video The R programming language, part of Code Clinic: R.
- If you're new to R, you might want to check one of the courses in the lynda.com library. But let's take a moment to review some R basics. R is used by statisticians and researchers because of its ability to handle large data sets and statistical functions. R is also used by scientists because it can be used to clearly document the equations used to produce an answer. Unlike spreadsheets, R doesn't hide its formulas, and results can be easily reproduced, which is an important part of science.
Because R is an interpretive language, one thing it isn't good for is building applications. If you're going to build a mobile app, you'd best go with a compiled language such as Java, Swift, or Objective C. R is most often thought of as a statistical computing language. But it's most intriguing feature is support for matrix arithmetic. For example, let's add two plus two. I'll go to the console and type in 2+2 and hit Return.
You can see the answer is four. Now I can assign that value to a variable. So for example, myVar gets 2+2 and if I show the value, myVar, I see that myVar contains 4. I can see that either on the console in the lower left-hand corner, or in the global environment in the upper right-hand corner. Now let's add ten to myVar.
myVar + 10, Return, which makes perfect sense. 10 plus 4 equals 14. But here's where it gets tricky. I'm going to use the Combine operator, and you can assign a list of values to a variable. So, myVar gets with the Combine operator, and will assign numbers 1, 3, 6, 12.
Now you'll notice that myVar contains a set of numbers, 1, 3, 6, and 12. And you can see that in the console in the lower left-hand corner, as well as the global environment in the upper right-hand corner. Now let's go ahead and add ten to myVar again. I'll simply type in myVar + 10, you'll notice that the values in myVar are not quite what you might expect. When we added ten to myVar, it added ten to each element of myVar.
So now instead of myVar equal to 1, 3, 6, and 12, myVar now equals 11, which is 10 plus 1, and 13, which is 3 plus 10, and 16, which is 6 plus 10, and 22, which is 10 plus 12. This is called matrix arithmetic. And it's the best and most confusing feature of R. Let's create a second variable. We'll call it mySecondVar, and we'll assign a combination of 2, 4, 6, 8.
And you can see that in the global environment in the upper right-hand corner I now have mySecondVar, and it contains numbers, 2, 4, 6, and 8. Now I can add myVar plus mySecondVar. And you'll see what it's done is combined the first number from myVar plus the first number from mySecondVar. So, my first number in myVar is one, the first number in mySecondVar is two.
So one plus two equals three. The next numbers are myVar, second number is three, mySecondVar second number is four, three plus four equals seven, and so on. This even becomes more complex when dealing with unequal sets, or a matrix, or complex structures such as data frames. It's also useful to understand a little bit of how R is used to create a subset of a data set. What if we just wanted the first element of myVar? I would type in myVar.
What if I wanted the third value of myVar? I would type in myVar. And you'll see that I'm returned the third element of myVar which happens to be six. That's simple, but what about a matrix with rows and columns? First let's create a matrix. To do this, I type in, matrix(1:28). And what that does is build a matrix, one dimension wide, with the numbers 1, 2, 3, and so on, up to 28.
Now let's create that same matrix but change something, I'll type in matrix(1:28), but I'll change the number of columns by typing in mcol=7. This is going to produce a matrix with seven columns. I have 28 values, so I have 4 rows left over. 7 times 4 equals 28. I can change how that works by typing in matrix(1:28,nrow=4) and I'll come up with the same looking matrix, but this time I've specified rows instead of columns.
I can save that matrix by simply calling up the History command. Let's go over to History in the upper right-hand corner. These are all of the commands that I've typed in. I'll select the last History command, I'll send it to the console, and I'm going to store that into a variable simply by typing in assign myMatrix. Now if I look in the environment, I'll see that I now have a new value called myMatrix.
And if I show the value of that in the console in the lower left-hand corner, we'll see that in fact, I have the same seven columns and four rows. Let me clear... Now that we have a matrix, let's take a look at it again. myMatrix, and I'm gonna open up the console window so we have a lot of room to work with. Now if I only wanted the fourth column of my matrix which is 13, 14, 15, and 16, I can simply type in myMatrix, I'll put in brackets this time instead of parentheses, and I'll type in [,4].
And I'll explain just in a second, you'll see that I am returned the value of the fourth column. The reason that I put in the comma with nothing preceding it is that when you specify a subset of a matrix in R, the first value is the row, the second value is the column. So for example, if I use the same command, myMatrix and I type in [3,] instead of what I typed in before, which is [,4] I will get the third row, and you'll notice that I'm 3, 7, 11, 15, 19, 23, which is in fact the third row of the matrix.
Suppose that I wanted the value in the third row and fourth column? Well so far we've done rows, we've done columns. And we can combine the two, myMatrix[3,4] I want the third row, and the fourth column, and what I'll come back with is the value 15. Which happens to be the value in the third row and fourth column. Subsetting can become very complex. So for example, what if we wanted to subset every number in the third row divisible by five? To do that, what I'll do is type in first of all, myMatrix, which is the value that I want to subset from.
I'll type in bracket because I'm subsetting, I want values from the third row, so I type in [3,] and we're gonna examine values for each column, what I'd like to find is any values in myMatrix in the third row, and I'll go ahead and put those brackets in there, that happens to be %% 5, which is divisible by 5, and that equation is equal to 0.
Now I realize this is some complex math, and I'll leave it to you to look up modulo later. If I hit Return, what we'll find is, all of the values in the third row, divisible by five, so three is not divisible by five, 7 is not, 11 is not, 15 is divisible by 5. 19 is not divisible, 23 is not divisible, and 27 is not divisible by 5. We can change that to see if there's any values that are divisible by three.
And again, I can simply use the History command. I'll select the last command we selected, send that to the console, and I can backspace, and change the five to three. Now you'll see that I'm returned the values of 3, 15, and 27. Those are all values in row three that are divisible by three. You can probably see how R can become somewhat confusing. I'll explain the code I wrote to solve these problems, but nothing will substitute for an in-depth course.
Mark introduces challenges and then provides an overview of his solutions in R. Challenges include topics such as statistical analysis, searching directories for images, and accessing peripheral devices.
Skill Level Intermediate
Q: R Studio tells me that it can't find files I expect to be available. Where can I find them?
A: Use the setwd() command to set the working directory to match the folder you're working in.
Q: I am unable to access the Lake Pend Oreille data from outside the U.S.
A: A static copy of this data is provided here for lynda.com members outside of the U.S.