From the course: Big Data Analytics with Hadoop and Apache Spark

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Average score analytics

Average score analytics

From the course: Big Data Analytics with Hadoop and Apache Spark

Start my 1-month free trial

Average score analytics

In this video, we will compute the average of the total score for students across all subjects. To begin, we will cache the total score data frame so we don't have to read the data source and compute the total scores again and again. Then we compute the average score for each student by executing an action. We group by the student, and compute the average of the total score. We display the results, as for last, the execution plan. Let's execute the score now. First, we see that the average total score shows up correctly for each student, as desired by the use case. Then in the execution plan, we see that an in-memory table scan has been used. This means that the cache is working, and total scores are not getting computed again. We then look at spark jobs number four. We see that there is shuffling happening as expected because of an action, but we also see that an entire stage has been skipped because of caching.…

Contents