Join Jack Dintruff for an in-depth discussion in this video Joins: Outer, inner, and left vs. right, part of Data Analysis on Hadoop.
- View Offline
- [Voiceover] Joins in Pig are just like joins in SQL…and any other sort of relational database language.…So, there's two different features of a join…and each one has two options.…The first feature is whether it's left or right,…the second feature is whether it is inner or outer.…You either have a left outer join…which will pull everything from the left side,…and try to match to the right side,…or you can do a right outer join…which does the same thing to the right side,…maintains all the values on the right side…and attempts to match them up with some value on the left.…So right now, what we have is we have a user table, right,…which has a whole bunch of users…and we have a comments table…and they both have a common field…which is what we're going to be joining on:…they both have a UserId.…
We can do a left outer once we have a single user.…We can have that be on our left side…so that we know that we're only matching…to all of their comments,…but we can also go the other way.…We can take the entire user table on the left and say,…
In this course, software engineer and data scientist Jack Dintruff goes beyond the basic capabilities of Hadoop. He demonstrates hands-on, project-based, practical skills for analyzing data, including how to use Pig to analyze large datasets and how to use Hive to manage large datasets in distributed storage. Learn how to configure the Hadoop distributed file system (HDFS), perform processing and ingestion using MapReduce, copy data from cluster to cluster, create data summarizations, and compose queries.
- Setting up and administrating clusters
- Ingesting data
- Working with MapReduce, YARN, Pig, and Hive
- Selecting and aggregating large datasets
- Defining limits, unions, filters, and joins
- Writing custom user-defined functions (UDFs)
- Creating queries and lookups