From the course: Spark for Machine Learning & AI

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Linear regression

Linear regression - Apache Spark Tutorial

From the course: Spark for Machine Learning & AI

Start my 1-month free trial

Linear regression

- [Instructor] Now that we have some data to work with, let's look at linear regression. So, I'm just going to verify I'm in the right directory. Great, now I'm going to start pyspar. Okay, first thing I want to do is import some code for linear regression. So I'll use my from pyspark.ml command. And I'll import linear regression. Now, in our last video, we downloaded and preprocessed a file called power plant.csv. I'm going to read that into a data frame and I'll call it pp for power plant data frame and I'll reference the spark context and I'll read the CSV file and that file is in my home directory. And it's called power_plant.csv. And let's just take a look at the structure. I forgot to indicate that there was a header in that file. As you may recall, the column names have a header in them. So I'm going to re-execute this read file and I'm going to say header=True. Another thing you'll notice is that common data types are all string. That's because I forgot to indicate that I…

Contents