From the course: Analyzing Big Data with Hive

Unlock the full course today

Join today to access over 22,700 courses taught by industry experts or purchase this course individually.

When to use SEMI JOIN

When to use SEMI JOIN - Hive Tutorial

From the course: Analyzing Big Data with Hive

Start my 1-month free trial

When to use SEMI JOIN

- [Instructor] Now let's take a look at filtering results using left semi join. This is a method of finding if a related record exists in another data set. This is another method of accomplishing the same thing as the ANSI standard EXISTS uses. So what we have essentially are two ways of accomplishing this. Let's say we have Table A and Table B. In Table A, we have a name and a value. We have Ben and Jennie, value of 100 and 200. Then in Table B, we just have Jennie. Let's say Table A was all customers, Table B was VIP customers, and we wanted to get everything from Table A, and we only want it to filter on the people that also existed in Table B, or VIP customers. In ANSI SQL, and as of Hive 0.13, this does work in both examples but we could use the EXISTS clause. So we'd say select star from a, everything from Table A, where exists, select star from b, and then we kind of have a join clause where a.name = b.name. So essentially it's the same idea as if we wanted to do a join between…

Contents