At the end of this video the learner will know how to handle quoted CSV files in Hive using a custom SerDe.
- [Instructor] Now let's take a look…at handling CSV files in Hive.…Now CSV files have a unique thing that if there is a value…in the file that actually needs to include the comma,…say the name of a company, then it puts quotes…around that value.…Now this poses a challenge for us…when we're working with the data in Hive.…In fact, we have to implement a custom engine…known as a SerDe for serialization and deserialization…just to handle those files properly.…So what I'm going to do is open up…the HUE Metastore Manager, then we'll upload a file…that has these quoted strings,…then I'll show you how to use the custom SerDe,…and then we'll apply that to our table settings,…and see what it does.…
So here in HUE, the first thing I want to do is…open the Metastore Manager, and I'm going to…create a new table out of a file.…I'm going to call this sales_withcomma.…I'll click on the ellipse and upload the file.…Under Exercise Files and Data,…we have the one that ends with WithCommas.csv,…and we'll choose this file, click next.…
This course shows how to use Hive to process data. Instructor Ben Sullins starts by showing you how to structure and optimize your data. Next, he explains how to get Hue, the Hadoop user interface, to leverage HiveQL when analyzing data. Using the newly configured option, he then demonstrates how to load data, create aggregate tables for fast query access, and run advanced analytics. He also takes you through managing tables and putting functions to use. This course is designed to help you find new ways to work with datasets so you can answer the tough data science questions that come your way.
- Defining data structures in Hive
- Selecting data
- Joining tables
- Manipulating data
- Filtering results
- Aggregating data
- Using built-in aggregate functions
- Mastering built-in table-generating functions
- Using CUBE and ROLLUP
- Using clauses: WHERE and HAVING
- Using LIKE, JOIN, and SEMI JOIN
- Using functions: String, math, date, and conditional