The five minutes you spend each week will provide you with a building block you can use in the next two hours at work. Review language basics, discover methods to improve existing R code, explore new and interesting features, and learn about useful development tools and libraries that will make your time programming with R that much more productive.
All series code samples can be downloaded at https://github.com/mnr/five-minutes-of-R.
Skill Level Intermediate
- [Instructor] If you're familiar with SQL, you're familiar with the phrases: INNER JOIN or FULL JOIN. And you're probably wondering how to do that with data frames NR. So, let's take a look at some of the commands that enable us to emulate INNER JOINS and OUTER JOINS. First, I need a couple of data frames, so I'll create df1 and df2. And it's important to look at what these things look like; df1 is a series of rows of capital letters followed by an index number; one through 26, df2 is lower case letters followed by an index of one through, well, not 26.
It goes 15, 20, and it jumps around a bit. And that's just to demonstrate how these rules will line up. We'll use df index as our common key to merge the two databases together. So, how do we do an INNER JOIN? In an INNER JOIN returns rows when there is a match in both tables. The command is simple, it's merge. And I merge df1 with df2. I'm going to hit control return.
Down in the console we can see that I have a merge between df1 and df2. First of all it gives me the index then the capital letters, then the lower case letters. I'll just take a close look here. You'll notice that right around here things start to go awry. 10, 11, 12, 13, 14 are missing because dfindex does not have a match between the two databases. Remember INNER JOIN only returns rows when there's a match in both tables.
Now let's take a look at a FULL JOIN. And to do a FULL JOIN it's the same. Merge, df1, df2, but this time we'll say all equals TRUE. And I'm going to hit control return, I now have a new merge, but you'll notice there are NAs. A FULL JOIN is all records from both tables, but it fills in nulls for missing matches. So, if we scroll down you can see that dfindex is used as the common key.
All the capital letters, all the lower case letters, but in line 11 the capital letters data frame has a corresponding letter for that. The lower case does not, so, it's givin us NA. Likewise, in 15, we've got a capital letter O and a lowercase K and then down towards the bottom you'll see that we have a line 27 which has a lower case r, but no corresponding upper case.
So, those are INNER JOINS and FULL JOINS. Or, FULL OUTER JOINS. In the next video, we'll talk about LEFT and RIGHT JOINS.