Join Jack Dintruff for an in-depth discussion in this video Filters, part of Data Analysis on Hadoop.
- [Voiceover] Filter is a filter.…It lets you take data…and remove some of it based on some boolean criteria.…So you can say if this particular field, right,…is equal to some value, I want it.…Otherwise, I don't care, get rid of it.…So let's say you have a data set…but you have a very specific inquiry,…like there's some disruptive user…and you want to look them up…and try to figure out some information…about them from a fraud abuse and risk standpoint,…that's usually what very specific filters are for,…but they can also be for in a very generic data set…with a lot of fields.…
You can say,…"Well, I only want fields that have this value."…And you will get many, many records…but they all pertain to what you're looking for.…So, really it's just a way for you to eliminate the noise…and get at the real data that you're interested in.…So, in this exercise what we're going to do…is filter the users' database for a particular user.…We found in the last exercise that User ID 267…has the highest score, or kharma,…among the android community users on stack overflow,…
In this course, software engineer and data scientist Jack Dintruff goes beyond the basic capabilities of Hadoop. He demonstrates hands-on, project-based, practical skills for analyzing data, including how to use Pig to analyze large datasets and how to use Hive to manage large datasets in distributed storage. Learn how to configure the Hadoop distributed file system (HDFS), perform processing and ingestion using MapReduce, copy data from cluster to cluster, create data summarizations, and compose queries.
- Setting up and administrating clusters
- Ingesting data
- Working with MapReduce, YARN, Pig, and Hive
- Selecting and aggregating large datasets
- Defining limits, unions, filters, and joins
- Writing custom user-defined functions (UDFs)
- Creating queries and lookups