From the course: Data Science Tools of the Trade: First Steps

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Spark: Spark shell

Spark: Spark shell

From the course: Data Science Tools of the Trade: First Steps

Start my 1-month free trial

Spark: Spark shell

- [Voiceover] One of the most important aspects of Spark is its use of Resilient Distributed Dataset, or RDD, to accomplish fault tolerance. Once created, RDD can be transformed into another RDD, or you can also take an action on an RDD. Let's create our first RDD of the README file stored in our Spark directory. You can see the README file in the /usr local Spark directory. So let's quit Spark for now, and let's checkout the README file, type ls. And the README.md file is there. Let's start the Spark shell again. Let's call our first RDD text file type val textFile = spark.read.textFile("README.md") . Press Enter. Looks like it worked. Now let's take some actions to the newly-created RDD, type textFile.first() . Press Enter. This action returns the first item in the dataset, which is # Apache Spark. So the line appears first in the README.md file. Let's take another action. Type textFile.count() . Press Enter. This action counts the number of items in the dataset, which is our…

Contents