Join Ben Sullins for an in-depth discussion in this video Using the exercise files, part of Analyzing Big Data with Hive.
- [Instructor] If you have access to the exercise files for this course you can download them to your desktop, as I've done here. We have three folders that are here. The first one is the data folder, which has all the data we'll actually be working with in our Hadoop environment. In the scripts folder we have all the SQL scripts that we're going to use which correspond to the video numbers, so 3-1 is for chapter three, the first video. 4-1 is for chapter four, et cetera. In the setup folder we have other scripts that we're going to use including a custom jar, to process the data differently.
And some other statements that we'll refer to throughout the videos that are just the code samples that we actually bring in to set up our Hive environment. If you're viewing this course on a mobile device, or you don't have access to the exercise files, that's okay. You can still follow along by watching how I use the files.
This course shows how to use Hive to process data. Instructor Ben Sullins starts by showing you how to structure and optimize your data. Next, he explains how to get Hue, the Hadoop user interface, to leverage HiveQL when analyzing data. Using the newly configured option, he then demonstrates how to load data, create aggregate tables for fast query access, and run advanced analytics. He also takes you through managing tables and putting functions to use. This course is designed to help you find new ways to work with datasets so you can answer the tough data science questions that come your way.
- Defining data structures in Hive
- Selecting data
- Joining tables
- Manipulating data
- Filtering results
- Aggregating data
- Using built-in aggregate functions
- Mastering built-in table-generating functions
- Using CUBE and ROLLUP
- Using clauses: WHERE and HAVING
- Using LIKE, JOIN, and SEMI JOIN
- Using functions: String, math, date, and conditional