From the course: Spark for Machine Learning & AI

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Introduction to Spark

Introduction to Spark - Apache Spark Tutorial

From the course: Spark for Machine Learning & AI

Start my 1-month free trial

Introduction to Spark

- [Instructor] Spark is a distributed, data processing platform for big data. Now, let's break down that statement into its three components. Distributed means Spark runs on a cluster of servers. Now, it runs equally well on a single server and that's what we'll use in this course. However, in a production environment, you typically run a number of servers to work with large data sets. Data processing means it performs computation. And, in the case of Spark, some of the most interesting computations are related to machine learning and data analysis. Big data is a term broadly applied to data sets that are not easily analyzed on a single server or using older data management systems that were designed to run on a single server. Spark is becoming increasingly polyglot with support for multiple languages. Software engineers familiar with Scala and Java can use those languages while data scientists who prefer Python and R can work with those languages. We'll use Python in this course…

Contents