From the course: Big Data Analytics with Hadoop and Apache Spark

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Reading bucketed data

Reading bucketed data

From the course: Big Data Analytics with Hadoop and Apache Spark

Start my 1-month free trial

Reading bucketed data

- [Instructor] In this video, I will show you how Spark reads bucketed data stored in Hive. We can read data in Hive using a SQL command. We do a simple SELECT statement to read the entire table and print its contents. We also print its execution plan. Let's run this code and examine the results. When we look at the execution plan, We can see that it is no different than reading a file from HDFS. Data frames, data sets, SQL, and RDDs provide different interfaces to the same underlying operations. So the execution plans will be similar irrespective of which API we use. The plan soon shows what HDFS file is read and will also provide partition information if it is used. We will now review some of the best practices for reading data into Spark in the next video.

Contents