Learn to use R subsetting to extract the data you need.
- [Narrator] One of the really impressive features about r is it's ability to slice data with something called sub-setting. It's really important that you learn how to use sub-setting, it'll save you a lot of time. So let's take a look at that. First of all, let's look at the built in data constant called letters. And this is just a list of all the capital letters in the alphabet. I can do a quick sub set by typing in letters. And then a bracket and three.
And what that's going to do is give me the third object of letters. I can pull of sets of elements, type in letters and I'll type in a bracket, and three colon five, which produces the third, fourth, and fifth element of letters. There's another way we can do this. Letters, bracket, and let's get three comma 20 colon 25. And what that'll produce is the third, and the 20, 21st, 22nd, 25th elements of letters.
You can also exclude selections by typing in, there's letters, that's what we want to search through. And I don't want, that's what the negative stands for, I don't want the third through fifth element of letters. So you can see we have a, b, and then we jump to f. There's another way to do this is, letters, and if I type in a bracket and then c, and a parenthesis negative three colon negative five, I get exactly the same thing.
So there's two ways to do exactly the same thing. I can also select, using true and false. And in order to do that I need another vector. And it'll contain true false. I want to show you something called repeat, real quick. And this is a command called repeat. And what I'll do is I'll tell it to repeat a vector called true comma false. And I want to repeat it 13 times. So it's going to produce true, false, true, false, true, false, 13 times. Now if I type in letters bracket, with exactly the command I just typed in repeat parenthesis c true comma false repeat that comma 13 times, what I'll get is every other letter in the alphabet.
Because I'm applying true for a, which prints a. And then false to b, which does not print b. Two dimensional data can also be sub setted. And I'll need a data frame to do that. So let me create a data frame real quick. We'll call it lots of letters. And now I have a data frame called lots of letters. You can see that up here in the global environment. Let's take a quick look at that data frame and you can see that I have three variables.
They're called letters upper case, letters lower case, and the position in the alphabet of that. And of course I'll have 26 rows, because that's how many letters there are in the alphabet. So let's go back to our example here. Now that I've got a data frame called lots of letters, I can sub set that. And you'll see lots of letters. There's the data frame and bracket. Now I'm going to select the third row.
And I'm going to put in a comma. And I'm not going to put anything after the comma. And what this will do is sub set the third row, all of the columns, or all of the variables. So I can select something different. I can type in lots of letters, bracket, and then nothing. And then a comma and three. What I've selected is all of the elements of the third variable. I can also sub set by the name of the variable, in this case lots of letters, followed by bracket, followed by the name of the variable I want to select.
And when I hit return, I'll get the contents of the first variable, which of course, is all the capital letters. I can also select ranges. So I can type in lots of letters and then a bracket. And then three colon eight, which will give me rows three through eight, followed by the second variable. So I get lower case letters three through eight. I can select logical conditions.
So let's type this. Lots of letters, followed by a bracket. And I want to select rows where capital letters equals, oh let's say, r for example. And then I want to also select anything from letters. And you can see that r gave us an error. It takes a second to parse down exactly what happened.
But it's assignment versus equality. So let's type that in again. Lots of letters and then a bracket. We're going to use the letters built in constant. Now the last time I typed in equals, what I should have done is equals equals. The difference is, again, one equals will put r into letters, which isn't going to happen. Two equals tests for equality. So now what I'm seeing is, is letters, equal to r.
And that's going to give me all the rows because it's in front of the comma. And then I want to return the equivalent value or the corresponding value from the column labeled letters. So let's go ahead and hit run. And you can see that we've gotten a lower case r. So it went to the row that contained letters. And the column, it gave us a lower case r. We can do and an or as well, so let's go ahead and do that. Lots of letters, and I'll type in a bracket.
And we're going to type in letters equals equivalent quote r. And by typing in a pipeline symbol, which is on the right hand side of your keyboard, depending on which keyboard layout you're using, I can say anything with letters equivalent to r, or letters equivalent to t. So that's going to give us two rows, r and t. And I want to return the lower case values of those.
So I hit a comma because we're going to pull it from the variable called letters lower case. And when I hit return, what I get back is the lower case r and lower case t. So that's a quick look at sub setting. And again, sub setting is worth practicing and spending some time with. It'll save you a lot of time when you actually start building formulas in r.
The five minutes you spend each week will provide you with a building block you can use in the next two hours at work. Review language basics, discover methods to improve existing R code, explore new and interesting features, and learn about useful development tools and libraries that will make your time programming with R that much more productive.
All series code samples can be downloaded at https://github.com/mnr/five-minutes-of-R.