Learn about how Weka can run on Spark.
- Although, running WEKA in its stand-alone mode is perfectly fine for many of our daily data-sized tests, we do need to tap into the power of distributed processing from time to time, especially when our dataset falls in the realm of big data. Since we already have well-known tools like Spark available for distributed processing tests, it would be ideal if WEKA can leverage the technology and WEKA does provide a way to harness the power of Spark. Let's see how we can go about configuring WEKA to take advantage of Spark.
From the gui chooser, go to tools and select package manager. In the package search window, type Spark and press enter. Choose distributed WEKA Spark. Click install and click yes. Click okay. Click yes.
Once you install the Spark package, we need to restart WEKA. Close the package manager, close WEKA gui chooser. Let's restart WEKA. Click okay. To allow you to run a distributed WEKA job in a user-friendly way, WEKA provides an option called KnowledgeFLow. Click on KnowledgeFlow, and the WEKA KnowledgeFlow environment shows up.
After installing the distributed WEKA Spark package, you'll see a new folder called Spark. Let's click on the arrow. You can create your own KnowledgeFlow from scratch, but an easier way is to start with a Template Flow. There are already quite a few Template Flows available for Spark jobs. Let's click on the Template Flows icon. As you can see here, there are already quite a few Template Flows available for Spark jobs.
Let's choose "Spark: create an ARFF header". We already finished our Spark installation on the same virtual machine where we installed WEKA and the Spark installation is ready to handle WEKA requests. Let's double-click on Spark job icon, which opens a Spark configuration dialogue window. There's a field called "Master Host", which is what connects WEKA to the Spark installation on the local host.
That's it! This is all it takes to run your WEKA jobs on Spark.
- Enabling technologies in data science
- Cloud computing and virtualization
- Installing and working with Proxmox, Hadoop, Spark, and Weka
- Managing virtual machines on Proxmox
- Distributed processing with Spark
- Fundamental applications of machine learning
- Distributed systems and distributed processing
- How Hadoop, Spark, and Weka can work together