Google Colab provides a great cloud-based environment for running Jupyter notebooks. In this video, learn how to set up Google Colab to run PySpark.
- [Instructor] We're going to be using Google's Colab…as our lab environment for Apache Spark.…Colab is a hosted Jupyter Notebook environment,…and what's great is that it's free,…and it requires no set up.…Colaboratory works with most major browsers,…but I suggest working with desktop versions…of either Chrome or Firefox.…Not surprisingly, Google's Colab notebooks…are very similar to Jupyter notebooks,…and you can upload or import an existing…Jupyter or Ipython notebook into Colab.…A huge bonus is that Colab notebooks are also stored…in the open source Jupyter Notebook format.…
That's the .ipynb.…If you haven't used a Jupyter Notebook before, don't worry,…I'll show you the basics in a couple of minutes.…You also have access to the shell.…You can use common Linux commands such as !wget and !pwd…to get the present working directory.…And just so you know, the code is executed…in a virtual machine dedicated to your account.…These virtual machines are recycled…when idle for awhile.…Colab notebooks are just like Google Docs and sheets,…
Author
Released
1/31/2019- Benefits of the Apache Spark ecosystem
- Working with the DataFrame API
- Working with columns and rows
- Leveraging built-in Spark functions
- Creating your own functions in Spark
- Working with Resilient Distributed Datasets (RDDs)
Skill Level Intermediate
Duration
Views
Related Courses
-
Apache Spark Essential Training
with Ben Sullins1h 27m Intermediate -
Spark for Machine Learning & AI
with Dan Sullivan1h 51m Beginner
-
Introduction
-
Apache PySpark1m 5s
-
What you should know1m 4s
-
-
1. Introduction to Apache Spark
-
The Apache Spark ecosystem4m 38s
-
Why Spark?4m 53s
-
Spark origins and Databricks2m 48s
-
Spark components2m 32s
-
-
2. Technical Setup
-
Set up the lab environment4m 53s
-
Importing1m 34s
-
-
3. Working with the DataFrame API
-
The DataFrame API2m 9s
-
Working with DataFrames2m 44s
-
Schemas8m 29s
-
Working with columns5m 11s
-
Working with rows5m 42s
-
Challenge31s
-
Solution3m 6s
-
-
4. Functions
-
Built-in functions8m 24s
-
Working with dates10m 10s
-
User-defined functions2m 54s
-
Working with joins10m 59s
-
Challenge29s
-
Solution15m 3s
-
-
5. Resilient Distributed Datasets (RDDs)
-
RDDs4m 6s
-
Working with RDDs7m 45s
-
-
Conclusion
-
Next steps40s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Set up the lab environment