Be able to select and sum columns from a table.
- [Voiceover] If it's already in a Hive table,…and you are a SQL person, syntax is identical.…Well almost identical, and so it becomes…very, very comfortable to just query these tables…as though they were SQL tables.…So there is a StackOverflow DB that I've created,…and it has a series of tables.…The first one is called comments,…and so we're going to be able to use StackOverflow,…and then select from comments,…or we can just explicitly define the DBN table name…every time that we use it and say…StackOverflow.comments,…and it just makes the query a little bit…more self-documenting.…
So we are going to select all of the scores…for all of the comments and calculate the sum,…and that's going to be our aggregate function.…And you can see, very, very simple.…We're literally just going to select the sum of score…from stackoverflow.comments.…Very clean, very simple.…So, we're just going to do hive -f,…so that we can pass it a file.…Select_aggregate.hql.…Then here we go, the job's kicking off,…the dependencies are being loaded…
In this course, software engineer and data scientist Jack Dintruff goes beyond the basic capabilities of Hadoop. He demonstrates hands-on, project-based, practical skills for analyzing data, including how to use Pig to analyze large datasets and how to use Hive to manage large datasets in distributed storage. Learn how to configure the Hadoop distributed file system (HDFS), perform processing and ingestion using MapReduce, copy data from cluster to cluster, create data summarizations, and compose queries.
- Setting up and administrating clusters
- Ingesting data
- Working with MapReduce, YARN, Pig, and Hive
- Selecting and aggregating large datasets
- Defining limits, unions, filters, and joins
- Writing custom user-defined functions (UDFs)
- Creating queries and lookups