Start free trial Sign in

From the course: Apache PySpark by Example

What you should know - Spark DataFrames Tutorial

From the course: Apache PySpark by Example

Start my 1-month free trial

What you should know

“

- [Narrator] I have designed this course so there are plenty of practice exercises and do exactly what it says on the tin, which is to learn PySpark by example. We're going to be using Google's Colab to run our PySpark environment in the cloud. Now, if you haven't used any cloud environment before, don't worry, it's really very easy and I'll show you how to do it. You're also welcome to install your version of Spark locally and run the exercise files from there. I'm using Spark, Version 2.3, but you can easily use another version as long as it's at least Version 2. If you try and run Apache Spark locally, and if you end up with a whole lot of Java errors, I suggest you switch to the Google Colab environment for now. I think your time is better spent learning how to use Spark, rather than learning how to install it. I would assume that most of you have some experience with working with Python's pandas, so I've included a section on what you should do in PySpark and what the pandas equivalent would be to make the transition easier. If you don't know about pandas, you can check out my other course on pandas in the LinkedIn Library. Don't worry if you don't know about pandas, you can still learn PySpark from scratch by following along.

Contents