Join Jack Dintruff for an in-depth discussion in this video YARN, part of Data Analysis on Hadoop.
- View Offline
- [Voiceover] What we're going to look at next is YARN,…which is Yet Another Resource Negotiator.…And it's much more generic.…Basically all YARN says is,…okay you're asking for a container or several containers,…they're gonna be this big,…and they're gonna get this many CPU cores.…That's really all YARN cares about.…The user can write their code to do…pretty much whatever they want,…but they have to request those resources…through the resource negotiator.…Newer execution engines, things like Tez and Spark…have begun to leverage YARN in such a way…that you can get major performance increases…when compared to the MapReduce execution engine.…
And so we'll see this later on in the Hive chapter…where we will be leveraging Tez instead of MapReduce…to do some very fast data processing.…In general YARN is just a next generation,…more extensible version of MapReduce,…but they are not mutually exclusive,…meaning that you can have YARN and MapReduce…running on the same cluster.…
In this course, software engineer and data scientist Jack Dintruff goes beyond the basic capabilities of Hadoop. He demonstrates hands-on, project-based, practical skills for analyzing data, including how to use Pig to analyze large datasets and how to use Hive to manage large datasets in distributed storage. Learn how to configure the Hadoop distributed file system (HDFS), perform processing and ingestion using MapReduce, copy data from cluster to cluster, create data summarizations, and compose queries.
- Setting up and administrating clusters
- Ingesting data
- Working with MapReduce, YARN, Pig, and Hive
- Selecting and aggregating large datasets
- Defining limits, unions, filters, and joins
- Writing custom user-defined functions (UDFs)
- Creating queries and lookups