Often data in Hive is low-level (atomic) and difficult to run analytical workloads against. This is often solved by creating aggregate or summary tables. This video showd how to accomplish this using HiveQL
- [Instructor] When you're working with data in Hive,…one of the things you'll most likely end up doing…is creating aggregate tables, which are rollup tables…that summarize the data for you…and the reason you do that…is because it's going to be a lot faster…when you're wanting to query it and analyze it.…So if you have a very low-level of detail,…say, the exact minute or second…that an event occurred, like a sales order,…but you often find yourself reporting…on a weekly or monthly basis,…it makes sense to create these tables in advance,…so that way, when you run your dashboards…or however you provide those analytics,…you have a really small and easy table to work from.…
So that's what we're going to take a look at here…is creating some aggregates in our Hive environment.…First, what I'm going to is move the sales files…that we downloaded earlier…over to a new location in hdfs.…If I'm already in where I've downloaded my exercise file to,…I'm just going to run hadoop fs -put…and the folder is data/sales-yearly,…
- Explain which commands are used to make changes in HDFS.
- Identify the commands used to upload data from the command line to the HDFS.
- Recognize two operations the HDFS performs when a user moves files.
- Summarize how to remove files recursively in HDFS.
- Recall how to select and implement partitions.
- Explain how to flatten a Struct data type in HiveQL.