Join Dan Sullivan for an in-depth discussion in this video Sample data from DataFrames, part of Introduction to Spark SQL and DataFrames.
- [Instructor] Now in this lesson, … we're going to take a look at sampling. … Now we may want to use sampling sometimes, … particularly when we have very large data sets, … and we're doing kind of an exploratory analysis, … we just want to get kind of an understanding … at a high level of what the data is like. … Sampling can be really useful for doing quick operations. … So let me just get the kernel. … I'm going to restart and clear the output … just so we can start fresh here. … And what I'm going to do is load the data. … There, so this is our location temperature data set … that we've been working with, … and the first thing I want to do is check the data frame … to find out how many rows are in there. … So I'll just do a simple count, … and we see there are 500,000 rows. … So let's see how we can take a sample, … or a subset of that, but randomly select a subset. … So I'm going to create a new data frame, … and I'm going to call it data frame one underscore S one … for sample one, …
- Installing Spark and PySpark
- Setting up a Jupyter notebook
- Loading data into DataFrames
- Filtering, aggregating, and saving data
- Querying and modifying DataFrames with SQL
- Exploratory data analysis
- Basic machine learning
Skill Level Intermediate
1. Introduction to Spark DataFrames
2. Installing Spark
3. Getting Started with Spark DataFrames
4. SQL for DataFrames
5. Data Analysis with Spark
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.