From the course: Big Data Analytics with Hadoop and Apache Spark

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Top student analytics

Top student analytics

From the course: Big Data Analytics with Hadoop and Apache Spark

Start my 1-month free trial

Top student analytics

- [Instructor] In this video, we will find the top student by each subject in the data source. This is the time to evaluate if we need to do repartitioning. Given that we have not generated any of the data frames from an action that needs further transformations, we don't need to look at repartitioning. Finding top students by subject is a tricky use case, and there are multiple ways to achieve it. In this reference, we first find the top student for each subject. Then we find the student who got the top score. To find the top score for each subject, we simply group by subject and find the maximum of the top score. The results are stored in the top score data frame. Then, we join the top score data frame with the total score data frame based on both the subject and the total score value. This will extract the list of student results that had this top score. We then print the results and also run an explain of the…

Contents