At the end of this video the student will know how and why to use the SEMI JOIN option in Hive.
- [Instructor] Now let's take a look at filtering results…using left semi join.…This is a method of finding if a related record…exists in another data set.…This is another method of accomplishing the same thing…as the ANSI standard EXISTS uses.…So what we have essentially…are two ways of accomplishing this.…Let's say we have Table A and Table B.…In Table A, we have a name and a value.…We have Ben and Jennie,…value of 100 and 200.…Then in Table B, we just have Jennie.…Let's say Table A was all customers,…Table B was VIP customers,…and we wanted to get everything from Table A,…and we only want it to filter on…the people that also existed in Table B,…or VIP customers.…
In ANSI SQL, and as of Hive 0.13,…this does work in both examples…but we could use the EXISTS clause.…So we'd say select star from a,…everything from Table A,…where exists, select star from b,…and then we kind of have a join clause…where a.name = b.name.…So essentially it's the same idea…as if we wanted to do a join between these tables,…but because the value of name…
This course shows how to use Hive to process data. Instructor Ben Sullins starts by showing you how to structure and optimize your data. Next, he explains how to get Hue, the Hadoop user interface, to leverage HiveQL when analyzing data. Using the newly configured option, he then demonstrates how to load data, create aggregate tables for fast query access, and run advanced analytics. He also takes you through managing tables and putting functions to use. This course is designed to help you find new ways to work with datasets so you can answer the tough data science questions that come your way.
- Defining data structures in Hive
- Selecting data
- Joining tables
- Manipulating data
- Filtering results
- Aggregating data
- Using built-in aggregate functions
- Mastering built-in table-generating functions
- Using CUBE and ROLLUP
- Using clauses: WHERE and HAVING
- Using LIKE, JOIN, and SEMI JOIN
- Using functions: String, math, date, and conditional