From the course: Apache PySpark by Example

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Working with joins

Working with joins - Spark DataFrames Tutorial

From the course: Apache PySpark by Example

Start my 1-month free trial

Working with joins

- [Instructor] A join brings together two sets of data, the left and the right. Spark compares the value of one or more keys of the left and right data and evaluates a join expression to decide whether it should bring the left set of data and the right set of data. The join expression determines whether two rows should join, and the join type determines what should be the result. Here are a couple of the different join types. We have inner joins, outer joins, left outer joins, and right outer joins amongst others. The PySpark syntax for joins is very simple. Df is the left data frame, which is joined with df2, the right data frame. Df.column equals df2.column determines whether the two rows should join and how is the join type, so that it's an inner join, outer join, left outer join, and so on. Let's take a look at the different join types in more detail. Inner joins keep rows with keys that exist in the left and right dataset. Outer joins keep rows with keys in either the left or…

Contents