Get up to speed with Hadoop. Learn tips and tricks for doing data science work in this popular big data platform.
- [Instructor] Hadoop is becoming the standard for many companies looking to warehouse their data and then analyze it. There are so many components and different parts of the ecosystem that you can easily become a specialist in just one area of Hadoop. Over the years, I've learned some of the most common tips and tricks to help you get going in Hadoop, and that's what we're going to take a look at in this course. Hi, I'm Ben Sullins, and I'm going to start in this course by walking you through some of the basic file management techniques in Hadoop.
Then, we'll take a look at how to access and analyze that data from Hive, the Hadoop SQL engine, and lastly, we'll dive into some of the techniques for running fast queries inside of that Hive engine. We'll be covering all of these topics and more to get you up to speed with Apache Hadoop. Let's dive in.
- Working with files
- Organizing files in HDFS
- Connecting to Hadoop
- Exploring Hive through Beeline
- Accessing Hive from Python
- Creating aggregates in Hive
- Selecting partitions in Hive
- Complex data structures in Hive
- Mapping data in Hive
- Creating flat tables for Impala
- Deconstructing Impala queries