Be able to select and sum columns from a table.
- [Voiceover] Select is for each…and for each is select.…Select is just the SQL terminology for it.…So people who come from an SQL background…will think of it as select,…but in Pig it's called for each generate.…Aggregate is where you take a large amount of data…and you make it much, much smaller,…or you create some representation of it.…So, for example, if you were to calculate the sum,…across a bunch of different rows in a database,…that would be an aggregation.…If you wanted to calculate the average as well,…that would also be an aggregation.…If you want to do some aggregation,…it's almost always done in a for each.…
So all we're gonna do is open up the Grunt interpreter…for Pig by typing pig and then hitting enter.…Once we do that, we'll have the pig interpreter…and we'll load up the data which exists on HDFS.…Once that comment data is loaded…and using the proper schema,…we'll be able to interact with it…as though it were just about any other type…of data format.…So we're gonna use Pig storage.…Pig storage lets you read.…
In this course, software engineer and data scientist Jack Dintruff goes beyond the basic capabilities of Hadoop. He demonstrates hands-on, project-based, practical skills for analyzing data, including how to use Pig to analyze large datasets and how to use Hive to manage large datasets in distributed storage. Learn how to configure the Hadoop distributed file system (HDFS), perform processing and ingestion using MapReduce, copy data from cluster to cluster, create data summarizations, and compose queries.
- Setting up and administrating clusters
- Ingesting data
- Working with MapReduce, YARN, Pig, and Hive
- Selecting and aggregating large datasets
- Defining limits, unions, filters, and joins
- Writing custom user-defined functions (UDFs)
- Creating queries and lookups