Join Jack Dintruff for an in-depth discussion in this video Need to know, part of Data Analysis on Hadoop.
- View Offline
- [Voiceover] This is an advanced course…for people who already have their own development cluster…and can follow along by downloading…the three data sets and independently importing them…into HDFS.…This course is for people who already have a background…in coding and are interested in learning the tools…used to process large volumes of data at scale.…The author will be using a virtual box image,…which contains all of the exercise files…needed to follow along with the author.…If you are not familiar with Hadoop terminology,…such as mapper and reducer,…I encourage you first watch the course,…Hadoop Fundamentals by Lynn Langit.…
In this course, you will leverage both Pig and Hive.…Although no prior knowledge of either is required.…All of the data in this course is from a snapshot…of StackOverflow.com.…And reflects the state of all of their communities,…including users, comments, and posts at a point in time.…
In this course, software engineer and data scientist Jack Dintruff goes beyond the basic capabilities of Hadoop. He demonstrates hands-on, project-based, practical skills for analyzing data, including how to use Pig to analyze large datasets and how to use Hive to manage large datasets in distributed storage. Learn how to configure the Hadoop distributed file system (HDFS), perform processing and ingestion using MapReduce, copy data from cluster to cluster, create data summarizations, and compose queries.
- Setting up and administrating clusters
- Ingesting data
- Working with MapReduce, YARN, Pig, and Hive
- Selecting and aggregating large datasets
- Defining limits, unions, filters, and joins
- Writing custom user-defined functions (UDFs)
- Creating queries and lookups