Join Robin Hunt for an in-depth discussion in this video Joins, part of The Basics of Data for Analytics.
- View Offline
- Joins. These are a key concept for any analyst who's going to build reports. Joins include or exclude data so understanding them is super important. First of all, understand that data can be in multiple data sets, and those data sets need bridges. And these bridges are called Joins. So if you define a Join, and we're using SQL as an example here. A join clause combines records from two or more tables in a relational database.
So, usually when we're tying tables together we're doing that in queries. It's important to understand that the joins you need don't always exist. It's important to understand how tables relate to one another since all the joins don't just pop up for us. We might actually have to create the joins, and that's only after figuring out what should be joined. So, let's get a basic understanding of the join types.
The two tables I have for our concept here are Table A and Table B. And so Table A has some of the same people in Table B. And then in Table A we have a couple of people that may not be in Table B. And Table B might have a person or two that's not in Table A. And the joins will tell us who's what and where. So there are three types of joins we're going to discuss today: Inner, left outer, and right outer. Remember, they control the data that we see in our results.
So they're very important to understand what they mean. So an inner join requires that there is a record in both of the joined tables. So I have an example to show you here of our Table A and Table B, but I think it's important to note that it may not always be the best join for what you're trying to accomplish. So if I tie Table A and Table B together, I will see the results of Robin, Daniel, and Marilyn. That's because Robin, Daniel, and Marilyn exists in both Table A and Table B, and an inner join has no choice, but to only show where I have equals in both tables.
Here's another example. So I've labeled this Table Group A and Table Group B. Now if you look real close, Table Group A is larger, but Table Group B contains some of the same people from Table Group A. If I were to join the groups together through an inner join, then I would only see the people in A that match the people in B. My final report set would be the people that matched in both tables. Okay, so let's talk about Left Outer.
So a left join is going to show everybody in Table A, and if there's a match in Table B. Let's look at our example here. So in Table A, I show Robin, Daniel, David, and Marilyn. Table B has Robin, Daniel, and Marilyn and it shows a NULL for David because he doesn't exist in Table B. This is great when you're trying to run reports for everyone in Table A and if there is a match in Table B.
A right outer join is not that different from a left outer join, except it'll pull everything from the right table, and it'll put a NULL in on the left table where our value doesn't exist. Our example here shows that we have Table B, which is the right table, shows Robin, Daniel, Jake, and Marilyn. Table A shows Robin, Daniel, and a NULL for Jake since he doesn't exist in Table A, and then Marilyn. Joins are key to producing data sets that are representative of the goal of your data.
It's one of the hardest fundamentals that you'll have to learn. I would encourage you to watch any video related to Joins and NULLS.