Learn the difference between join types with Hadoop.
- [Voiceover] We're gonna do two different types of joins…just to demonstrate the difference…between a left join and a right join.…A left join is going to keep every single key…on the left side.…A right join is going to keep every single key…on the right side.…And then they're going to attempt to find…a corresponding key on the opposite side of the join…to match with.…And so, because we're doing inner joins,…if there is not a match, those rows will be dropped.…Now, in theory, because this is a completed database,…there should be no instance in which there is a comment…with a user ID that does not exist in the user ID table.…So, we're going to do our first join.…
And, our first join is going to be a left join…and we are not going to specify an outedness,…so it's going to be an inner join by default.…So, let's go ahead and take a look…at what this is gonna look like.…So, we're going to use the stackoverflow database…just so that if we were to create a new table,…which we will, it will be created within…the stackoverflow database.…
In this course, software engineer and data scientist Jack Dintruff goes beyond the basic capabilities of Hadoop. He demonstrates hands-on, project-based, practical skills for analyzing data, including how to use Pig to analyze large datasets and how to use Hive to manage large datasets in distributed storage. Learn how to configure the Hadoop distributed file system (HDFS), perform processing and ingestion using MapReduce, copy data from cluster to cluster, create data summarizations, and compose queries.
- Setting up and administrating clusters
- Ingesting data
- Working with MapReduce, YARN, Pig, and Hive
- Selecting and aggregating large datasets
- Defining limits, unions, filters, and joins
- Writing custom user-defined functions (UDFs)
- Creating queries and lookups