From the course: Building Recommender Systems with Machine Learning and AI

Unlock this course with a free trial

Join today to access over 22,600 courses taught by industry experts.

Apache Spark architecture

Apache Spark architecture

- [Instructor] The reason Spark is a big deal is because it's very efficient at distributing the processing of massive data sets across a cluster and in a reliable manner. Architecturally it looks like this. You write a driver script in either Python, Scala, or Java that defines how you want to process your data using the APIs Spark provides. Once you launch that driver script, which is usually from the master node of your cluster, it will communicate with your cluster manager to allocate the resources it needs from your cluster. That cluster manager could Hadoop's YARN cluster manager if you're running Spark on top of a Hadoop cluster or Spark has its own cluster manager as well if you want to use it instead. That cluster manager allocates a bunch of executor processes across your cluster that will do the actual work of processing your data. Spark handles all the details of figuring out how to divide your data up and process it in the most efficient manner. It does this by keeping…

Contents