From the course: Apache PySpark by Example

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Working with columns

Working with columns - Spark DataFrames Tutorial

From the course: Apache PySpark by Example

Start my 1-month free trial

Working with columns

- [Instructor] In the next few videos, I'll provide both the pandas Syntax and the PySpark Syntax. If you're familiar with pandas, it'll make the transition from pandas to PySpark easier. In PySpark, it's possible to access a data frame's column either by attribute, so a dot notation, or by indexing, where we would use square brackets. We can't always use the dot notation because this will break when the column names have reserved names or attributes to the data frame class. If you've used pandas, similar rules apply and you can use the df column or the df with square brackets. In both PySpark and pandas, df dot column will give you the list of the column names. In both PySpark and pandas, you can select more than one column using a list within square brackets. In PySpark, it's more common to use data frame dot select and then list the column names that you want to use. Let's say you wanted to add a new column to your data frame, where the values in this new column are twice that of…

Contents