From the course: Introduction to Spark SQL and DataFrames
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Set up a Jupyter notebook
From the course: Introduction to Spark SQL and DataFrames
Set up a Jupyter notebook
- [Instructor] All right, I have opened a terminal, and I've navigated to my working directory, which is just in my Spark SQL directory, I've created one called work. And I'll just list the files, showing it's empty right now. So now I'm going to run PySpark, this will start a Jupyter notebook for me. Now because the directory's empty, I don't have any notebooks here, so I'm going to create a new notebook, and I'm going to use Python3, and the first thing I want to do is load some data. Before I can start working the data, I need to do a little setup work. First thing I need to do is import the PySpark SQL package from pyspark.sql, I want to import the one thing that I needed for this example, is called SparkSession, so let's load that. Now I want to actually create a Spark context, and that's basically a pointer to a data structure that represents the cluster and allows me to send commands to the cluster and…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
Set up a Jupyter notebook2m 1s
-
Load data into DataFrames: CSV Files7m 26s
-
Load data into DataFrames: JSON Files3m 16s
-
Basic DataFrame operations3m 26s
-
Filter data with DataFrame API2m 13s
-
Aggregate data with DataFrame API3m 47s
-
Sample data from DataFrames5m 25s
-
Save data from DataFrames3m 27s
-
-
-
-