Join Jack Dintruff for an in-depth discussion in this video Using the exercise files, part of Data Analysis on Hadoop.
- [Voiceover] The exercise files for this course…are delivered in a single cluster Hadoop environment.…If you have access to the exercise files for this course…and the CPU available to install and run a virtual machine,…you can download the split RAR files…provided to follow along with the author.…First, install VirtualBox.…Then download a Hortonworks sandbox,…and in order to extract the split RAR files,…be sure to have downloaded…either Unarchiver for Mac or Winrar for Windows.…
For Windows users, your exercise files have been…compressed using Winrar.…To decompress these files and follow along with the author,…go ahead and select part one,…and this should open up the VMDK and VBOX.…I'll select both and extract to a specified folder.…We'll put this on the desktop…and begin the decompression.…And once this decompression is complete,…I'll show you how to access each of the exercise files…using command line.…
Once the download is complete,…go ahead and power on the machine.…This will take a little bit of time.…So we'll Alt + F5 into the machine,…
In this course, software engineer and data scientist Jack Dintruff goes beyond the basic capabilities of Hadoop. He demonstrates hands-on, project-based, practical skills for analyzing data, including how to use Pig to analyze large datasets and how to use Hive to manage large datasets in distributed storage. Learn how to configure the Hadoop distributed file system (HDFS), perform processing and ingestion using MapReduce, copy data from cluster to cluster, create data summarizations, and compose queries.
- Setting up and administrating clusters
- Ingesting data
- Working with MapReduce, YARN, Pig, and Hive
- Selecting and aggregating large datasets
- Defining limits, unions, filters, and joins
- Writing custom user-defined functions (UDFs)
- Creating queries and lookups