From the course: Data Science Tools of the Trade: First Steps

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Spark architecture and features

Spark architecture and features

From the course: Data Science Tools of the Trade: First Steps

Start my 1-month free trial

Spark architecture and features

- Spark brings general purpose computing to Hadoop. It can handle both real-time and batch processing, which makes Hadoop more flexible. In a cluster mode, which involves multiple computers working together to get a job done, you need to designate a Master Computer. The rest of the computers in the same cluster are known as Workers. When you submit a job to the cluster, one of the Workers starts a Driver Program, which in turn creates a SparkContext object. The SparkContext object then connects to a cluster manager on the Master Computer, which could be a Sparks Default Resource Manager or YARN. Next, Spark acquires Executors on Worker computers, which either perform computation or handle data for the driver. You can run Spark in a single-node mode too. Regardless of what mode you are using to run Spark, you can connect to the Master Computer interactively and issue commands. This interactive environment is called Spark Shell. Spark is written in Scala. The Spark Shell gives full…

Contents