Lynn discusses examples of business needs which are driving the demand for fast Hadoop solutions.
- [Instructor] So change in any ecosystem is driven by customer need and it's the same in the world of Fast Hadoop. So as more and more data is generated and can be more cheaply and easily stored, whether it's locally or more commonly, in the new cloud based data lakes, whether they're on Amazon NS3, or Google, or Azure or something else. Customers are more and more wanting to have fast analysis of these huge lakes of data. Example use cases are around scenarios such as stream data processing, so streaming, extract, transform, and load, so in other words, taking data that's coming in and processing it showing where there's exceptions or outliers or bad data and responding.
Data enrichment, taking data that's streaming in and maybe combining it with other data sources whether they're authoritative or evaluative like machine learning and putting additional tagged data so that the data can be then further analyzed downstream. Another use case I've seen from my customers is trigger event detection. So this can be around devices, it can be around systems, where thresholds are exceeded. And the idea here is around fast. That the data coming in can be quickly evaluated as an anomaly.
We are probably all familiar with a long time user of a fast Hadoop system, which is commercial credit card services. I know I'm glad when I get that phone call that says, "Hey, this purchasing pattern seems to be "out of normal behavior, has your card been compromised?" This kind of functionality is coming to more and more businesses, as the Haboop ecosystem is evolving. Another set of functionalities around machine learning, specifically around sentiment analysis. So a real common use case that pretty much everyone's familiar with is Twitter sentiment.
Bringing that information via linguistic analysis more quickly to, for example, a commercial client can result in turning around customers who've had bad experiences and fixing those situations to bringing new customers on, people who are looking for a service and can't find it. Another area that I've done some work here is around what's called fog computing. It's kind of a funny name, but the idea is that you have IoT devices in some sort of secondary location. Could be a commercial business or even your home, that are not directly connected to the public internet all the time.
Maybe indirectly connected through an edge layer. In case of your home, that could be your router. So the fog is the edge, it's kind of a strange name, but the idea is to get the information from the devices, the event information from the edge more quickly so that you can again respond. So, for example, if it's raining outside so you can turn your sprinklers off so you're not like wasting water type of thing. So the whole idea around this is to take more information than you're getting and to respond to it more quickly.
Author
Released
7/5/2017- Relate which file system is typically used with Hadoop.
- Explain the differences between Apache and commercial Hadoop distributions.
- Cite how to set up IDE - VS Code + Python extension.
- Relate the value of Databricks community edition.
- Compare YARN vs. Standalone.
- Review various streaming options.
- Recall how to select your programming language.
- Describe the Databricks environment.
Skill Level Intermediate
Duration
Views
Related Courses
-
Learning Hadoop
with Lynn Langit4h 48m Beginner -
Apache Spark Essential Training
with Ben Sullins1h 27m Intermediate
-
Introduction
-
Welcome53s
-
-
1. Hadoop Core Fundamentals
-
Modern Hadoop1m 53s
-
Hadoop libraries1m 23s
-
Run Hadoop job on GCP1m 52s
-
Databricks on AWS2m 32s
-
-
2. Setting Up a Hadoop Dev Environment
-
Load data into tables1m 51s
-
3. Hadoop Batch Processing
-
Processing options1m 2s
-
Resource coordinators1m 30s
-
Compare YARN vs. Standalone1m 30s
-
-
4. Fast Hadoop Options
-
Big data streaming1m 57s
-
Streaming options1m 10s
-
Apache Spark basics1m 46s
-
Spark use cases1m 2s
-
5. Spark Basics
-
Apache Spark libraries3m 24s
-
Spark shell1m 53s
-
-
6. Using Spark
-
Tour the notebook5m 29s
-
Import and export notebooks2m 56s
-
Calculate pi on Spark8m 19s
-
Import data2m 50s
-
Transformations and actions4m 43s
-
Caching and the DAG6m 49s
-
7. Spark Libraries
-
Spark SQL8m 34s
-
SparkR6m 11s
-
Spark ML: Preparing data4m 21s
-
Spark ML: Building the model3m 50s
-
MXNet or TensorFlow2m 30s
-
Spark with GraphX2m 12s
-
-
8. Spark Streaming
-
Spark streaming4m 21s
-
9. Hadoop Streaming
-
Pub/Sub on GCP3m 59s
-
Apache Kafka1m 26s
-
Kafka architecture1m 6s
-
Apache Storm1m 30s
-
Storm architecture1m 36s
-
-
10. Modern Hadoop Architectures
-
Conclusion
-
Next steps26s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Fast Hadoop use cases