From the course: Apache PySpark by Example

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Working with rows

Working with rows - Spark DataFrames Tutorial

From the course: Apache PySpark by Example

Start my 1-month free trial

Working with rows

- [Instructor] We can filter rows based on certain conditions, so in PySpark we specify the DataFrame dot filter and then we specify the condition that we're looking to filter by. In pandas it's very similar, where you just specify the DataFrame dot column within square brackets of the data frame. The other very interesting use case is Unique Rows and this is when we want to determine the unique rules for a column. So in PySpark we select the DataFrame and then we tag the distinct function at the end that, so it's df.select. The column names that you're looking to determine the Unique Rows for distinct and then you can display them using the show command. In pandas you would've used the unique function. Now sorting is a very important function and in PySpark we use orderBy. In pandas you would've used the sort_values function and you provided the column name. Now since DataFrames are immutable you can't just add to the DataFrame. Instead what you have to do is union the original…

Contents