From the course: Big Data Analytics with Hadoop and Apache Spark
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Average score analytics
From the course: Big Data Analytics with Hadoop and Apache Spark
Average score analytics
In this video, we will compute the average of the total score for students across all subjects. To begin, we will cache the total score data frame so we don't have to read the data source and compute the total scores again and again. Then we compute the average score for each student by executing an action. We group by the student, and compute the average of the total score. We display the results, as for last, the execution plan. Let's execute the score now. First, we see that the average total score shows up correctly for each student, as desired by the use case. Then in the execution plan, we see that an in-memory table scan has been used. This means that the cache is working, and total scores are not getting computed again. We then look at spark jobs number four. We see that there is shuffling happening as expected because of an action, but we also see that an entire stage has been skipped because of caching.…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.