From the course: Cloud Hadoop: Scaling Apache Spark

Spark Job on Google Cloud Platform - Apache Spark Tutorial

From the course: Cloud Hadoop: Scaling Apache Spark

Start my 1-month free trial

Spark Job on Google Cloud Platform

- [Instructor] So, we spun up our Hadoop cluster in less than five minutes. What can we do with it? Well, as you can see, I clicked on Jobs and then I'm in the Submit a job section. My example cluster populated by default. If you had more than one cluster, it would show up here. And then, for Job type, the default is Hadoop job, but I've selected Spark job, because were going to be focusing on fast Hadoop or in memory Hadoop in this course. You'll notice the libraries for Python Spark, Hive, Spark Sequel, and Pig are also loaded on the default image on the GCP managed Hadoop cluster. I then take in some example files and loaded the path in to the Jar file for a Spark example, and then, I've set the entry point or the main class. I've passed a value to the argument. So, what this job's going to do, it's going to calculate the digits of pi using the Spark library. It's probably to look at the output, so I'm going to go ahead and click Submit and that'll run this job. Now, your screen might look different here because I've run this job once before. And that's why I have two lines of it. The job on the bottom is completed. And you can see, in 36 seconds, if we click on the job and we enable line rapping, there is pi calculated. Now, I know at this point in the course, you're probably not familiar with Spark and so, you're probably really wondering how a Hadoop job was able to run in 36 seconds. Typically, in the old Hadoop, the world of map reduce, this would have been a batch job, and probably would have taken several minutes. And I really wanted to show this modern Hadoop configuration to peak your interest in this course. The world of Hadoop is changing. It's fast, it's easy to set up, and it's applicable to a lot more use cases.

Contents