When working with large tables in Hive it can dramatically improve query performance by selecting a specific partition. This video showd how to identify the partitions and then what to add to your query to select one of them.
- [Narrator] When working with large tables in Hive,…it can dramatically improve query performance…by only selecting a specific partition of the table.…And if you're setting up these tables,…it often makes sense most of the times,…at least by year, but often by other dimensions…or other categories of things,…to separate partitions.…That way if people, the analysts…are currently only working in one section of that data,…they're having to analyze a much smaller set.…
So let's take a look at that now…in Partitioning Data in Hive.…First what I want to do,…and I have the exercise file for this video open here.…I want to open up my web browser…and I want to go into the Hue UI.…I'll go over to Hive.…And here's where we're going to do our work.…So first, let me just delete…any previous queries that were there…and I'm going to create a new table called…sales all years partitioned.…
And if I scroll down to the bottom,…it's partitioned by company name…and this is a string…and this table's going to be stored as a text file.…So let me paste that in to my Hive query window…
- Explain which commands are used to make changes in HDFS.
- Identify the commands used to upload data from the command line to the HDFS.
- Recognize two operations the HDFS performs when a user moves files.
- Summarize how to remove files recursively in HDFS.
- Recall how to select and implement partitions.
- Explain how to flatten a Struct data type in HiveQL.