From the course: Big Data Analytics with Hadoop and Apache Spark
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Reading bucketed data
From the course: Big Data Analytics with Hadoop and Apache Spark
Reading bucketed data
- [Instructor] In this video, I will show you how Spark reads bucketed data stored in Hive. We can read data in Hive using a SQL command. We do a simple SELECT statement to read the entire table and print its contents. We also print its execution plan. Let's run this code and examine the results. When we look at the execution plan, We can see that it is no different than reading a file from HDFS. Data frames, data sets, SQL, and RDDs provide different interfaces to the same underlying operations. So the execution plans will be similar irrespective of which API we use. The plan soon shows what HDFS file is read and will also provide partition information if it is used. We will now review some of the best practices for reading data into Spark in the next video.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.