From the course: Big Data Analytics with Hadoop and Apache Spark
Unlock the full course today
Join today to access over 22,400 courses taught by industry experts or purchase this course individually.
Top student analytics
From the course: Big Data Analytics with Hadoop and Apache Spark
Top student analytics
- [Instructor] In this video, we will find the top student by each subject in the data source. This is the time to evaluate if we need to do repartitioning. Given that we have not generated any of the data frames from an action that needs further transformations, we don't need to look at repartitioning. Finding top students by subject is a tricky use case, and there are multiple ways to achieve it. In this reference, we first find the top student for each subject. Then we find the student who got the top score. To find the top score for each subject, we simply group by subject and find the maximum of the top score. The results are stored in the top score data frame. Then, we join the top score data frame with the total score data frame based on both the subject and the total score value. This will extract the list of student results that had this top score. We then print the results and also run an explain of the…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.