Understand how the Hadoop ecosystem is changing and get introduced to key capabilities covered in this course.
- [Instructor] As we begin our journey looking at Advanced Hadoop, we're going to start with what I call Modern Hadoop. You might be surprised to know that both open source and commercial Hadoop is over 10 years old. What I'm finding though that's driving the adoption velocity are innovations, both in public cloud services around Hadoop such as those offered not only by Amazon but also by Microsoft and Google, and also innovations in other devices and domains. Once that I've had experience with over the last 12 to 18 months are IoT, or Internet of Things, and Genomics around bio and formatics for personalized cancer treatment.
Now we'll talk about these domains but I'll also generalize because the maturity of the Hadoop ecosystem has more and more applicability. Let's look at the core components, kind of as a review. So we have storage, compute, and management. In the area of storage we have Hadoop File System and then we have some vendor optimizations, one will be using this course is around the vendor, Data Bricks, which has a commercial version of Hadoop with the library Apache Spark, and that's the Data Bricks File System. We'll also look at using cloud based file systems, such as S3 from Amazon and the Google Cloud Storage File System.
These are called Data Legs. In the compute area, my expectation as I mentioned in an earlier movie is that you will be familiar with the core map reduced paradigm that Hadoop was originally built on. We're going to be focusing on some of the newer compute libraries that are available. In particular we're going to look at Apache Spark, which allows compute processes to be run in the memory of the worker nodes and significantly increases the processing speed of the Hadoop jobs. We'll also look at some of the other popular libraries out there, such as Storm, or now it's becoming Heron from Twitter.
In addition to this, we'll consider some of the management capabilities of Modern Hadoop. These include Yarn and Mesos.
Author
Released
7/5/2017- Relate which file system is typically used with Hadoop.
- Explain the differences between Apache and commercial Hadoop distributions.
- Cite how to set up IDE - VS Code + Python extension.
- Relate the value of Databricks community edition.
- Compare YARN vs. Standalone.
- Review various streaming options.
- Recall how to select your programming language.
- Describe the Databricks environment.
Skill Level Intermediate
Duration
Views
Related Courses
-
Apache Spark Essential Training
with Ben Sullins1h 27m Intermediate
-
Introduction
-
Welcome53s
-
-
1. Hadoop Core Fundamentals
-
Modern Hadoop1m 53s
-
Hadoop libraries1m 23s
-
Run Hadoop job on GCP1m 52s
-
Databricks on AWS2m 32s
-
-
2. Setting Up a Hadoop Dev Environment
-
Load data into tables1m 51s
-
3. Hadoop Batch Processing
-
Processing options1m 2s
-
Resource coordinators1m 30s
-
Compare YARN vs. Standalone1m 30s
-
-
4. Fast Hadoop Options
-
Big data streaming1m 57s
-
Streaming options1m 10s
-
Apache Spark basics1m 46s
-
Spark use cases1m 2s
-
5. Spark Basics
-
Apache Spark libraries3m 24s
-
Spark shell1m 53s
-
-
6. Using Spark
-
Tour the notebook5m 29s
-
Import and export notebooks2m 56s
-
Calculate pi on Spark8m 19s
-
Import data2m 50s
-
Transformations and actions4m 43s
-
Caching and the DAG6m 49s
-
7. Spark Libraries
-
Spark SQL8m 34s
-
SparkR6m 11s
-
Spark ML: Preparing data4m 21s
-
Spark ML: Building the model3m 50s
-
MXNet or TensorFlow2m 30s
-
Spark with GraphX2m 12s
-
-
8. Spark Streaming
-
Spark streaming4m 21s
-
9. Hadoop Streaming
-
Pub/Sub on GCP3m 59s
-
Apache Kafka1m 26s
-
Kafka architecture1m 6s
-
Apache Storm1m 30s
-
Storm architecture1m 36s
-
-
10. Modern Hadoop Architectures
-
Conclusion
-
Next steps26s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Modern Hadoop