From the course: Apache PySpark by Example

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Working with DataFrames

Working with DataFrames - Spark DataFrames Tutorial

From the course: Apache PySpark by Example

Start my 1-month free trial

Working with DataFrames

- [Instructor] In the next few videos, I'll provide both the Panda syntax and the PySpark syntax. If you're familiar with Pandas it'll make the transition from Pandas to PySpark just that little bit easier. When working with Pandas, we need to import the Pandas library. With Apache Spark, we need a Spark session as our interface so all we do is import PySpark and get access to the Spark session via Spark. Assuming that we're reading a CSV file in, then we could create a DataFrame by loading this CSV file. If you look at the Pandas documentation, you can see that there are significantly more options available to you when reading a CSV file. Spark allows you to read a CSV file by just typing spark.read.csv and the path to that file. In Pandas, you can view the first few rules of your DataFrame by specifying the DataFrame name and the number of rules you want to view. In this instance, we want to view the first three rules of the DataFrame DF. In Spark, you have a couple of options…

Contents