- Installing Spark and PySpark
- Setting up a Jupyter notebook
- Loading data into DataFrames
- Filtering, aggregating, and saving data
- Querying and modifying DataFrames with SQL
- Exploratory data analysis
- Basic machine learning
Skill Level Intermediate
- [Dan] Apache Spark and SQL are both widely used for data analysis and data science. In this course we'll introduce data frames the foundational data structure in Apache Spark. We'll also see how to use SQL when working with data frames. In this course we'll learn about installing Spark, using Jupyter notebooks, and loading data from CSV and JSON files into Spark. You'll learn about basic operations like filtering and aggregating using both the data frame API and with SQL. You'll also learn more advanced techniques like joining data, eliminating duplicates, and understanding how to work with null values. We'll also develop techniques for exploratory data analysis including analyzing time series data, using clustering, and applying linear regression. So join us now to learn about Apache Spark, SQL, and how to do data analysis with the two together.
Advanced SQL for Data Scientistswith Dan Sullivan1h 24m Advanced
1. Introduction to Spark DataFrames
2. Installing Spark
3. Getting Started with Spark DataFrames
4. SQL for DataFrames
5. Data Analysis with Spark
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.