Get up to speed with Spark, and discover how to leverage this powerful platform to efficiently and effectively work with big data.
- Since we started working with big data and trying to understand how to store vast amounts of information and then make use of that, we've developed many platforms over and over and reinvented them to come up with the ultimate solution, the ultimate platform for actually working with big data. Apache Spark is the latest iteration of this. It's the latest manifestation of a platform that is enabling new ways to work with big data. Hi, I'm Ben Sullins, and I've been a data geek since the late 90s, focused on helping organizations get the most out of their data.
In this course, we'll look at how to use the Apache Spark platform for data science. I'll start by showing you an overview of the platform and going through each component, so we have a baseline understanding of how it works. Then, we'll take a look at using Spark to analyze data with Python using PySpark, then using Spark SQL. We'll explore machine learning techniques, and we'll finish by creating a streaming analytics application using Spark Streaming. We'll be covering all of these topics to get you up to speed with Spark and help you start delivering effective and more comprehensive insights. Let's dive in.
- Understanding Spark
- Reviewing Spark components
- Where Spark shines
- Understanding data interfaces
- Working with text files
- Loading CSV data into DataFrames
- Using Spark SQL to analyze data
- Running machine learning algorithms using MLib
- Querying streaming data
- Connecting BI tools to Spark