Learn about DataFrames, a widely used data structure in Apache Spark. Discover how to manipulate and analyze distributed data with the DataFrames API and SQL.
- [Dan] Apache Spark and SQL are both widely used for data analysis and data science. In this course we'll introduce data frames the foundational data structure in Apache Spark. We'll also see how to use SQL when working with data frames. In this course we'll learn about installing Spark, using Jupyter notebooks, and loading data from CSV and JSON files into Spark. You'll learn about basic operations like filtering and aggregating using both the data frame API and with SQL. You'll also learn more advanced techniques like joining data, eliminating duplicates, and understanding how to work with null values. We'll also develop techniques for exploratory data analysis including analyzing time series data, using clustering, and applying linear regression. So join us now to learn about Apache Spark, SQL, and how to do data analysis with the two together.
- Installing Spark and PySpark
- Setting up a Jupyter notebook
- Loading data into DataFrames
- Filtering, aggregating, and saving data
- Querying and modifying DataFrames with SQL
- Exploratory data analysis
- Basic machine learning