From the course: Cloud Hadoop: Scaling Apache Spark

Unlock the full course today

Join today to access over 22,700 courses taught by industry experts or purchase this course individually.

Spark ML: Preparing data

Spark ML: Preparing data - Apache Spark Tutorial

From the course: Cloud Hadoop: Scaling Apache Spark

Start my 1-month free trial

Spark ML: Preparing data

- [Instructor] Next, we're going to take a look at machine learning on Spark. And to do that, in our workspace, we'll import, a SparkMl notebook. Now, this is a longer example. So I'm going to break this into a couple of parts. And also I'm going to advise if you're new to machine learning, you might want to look at the external resource that I've referenced. So in this notebook, we're going to return to this business problem of looking at the farmer's market dataset. And we're going to explore, the hypothesis. The number of farmer's markets in a given zip code can be predicted from the income and taxes paid in a given area. Now, what I've done is I've categorized the work steps. So the first part, we're going to load and prepare the data. We're not actually using machine learning here, we're using pre-preparation steps that are typical, when you work with machine learning. But it allows us to review some of the…

Contents