Stream processing is a technology that is growing in popularity for large scale real-time processing. In this video, learn about stream processing and how it differs from batch processing.
- [Instructor] I will start off by introducing the concept of stream processing in this video. Stream processing is becoming more and more popular for handling big data and delivering actions in real time. Stream processing deals with the ability to understand and process a continuous stream of data and produce insights in real time. Two concepts stand out in this definition, continuous stream of data and processing in real time. What are some of the key characteristics of stream processing that differentiates it from batch processing? First, stream processing deals with unbounded datasets. While processing a given record in the dataset, it is not possible to know how many more records exist in the stream. The unbounded stream continues forever. Stream processing is done one record at a time. Each record is inspected, transformed, and analyzed. It is also possible to create windows that provide additional summaries. Computations on streams are real-time. Records are processed, insight generated, and pushed to the next stage in real time. Stream processing has low latency, where the entire processing happens in subsequent time intervals. There should be no visible lag then required from raw data ingestion to delivery of insights. It should enable parallel processing in order to scale across large quantities of data in real time. Stream processing is done by building pipelines. A pipeline consists of streaming inputs, processing jobs, and streaming outputs. This picture shows a typical processing pipeline. There will be multiple streaming inputs, possibly in different formats. Processing tasks will cleanse, process, and transform input data, and then push them to streaming outputs. Inputs may be combined to deliver insights. Outputs of one processing task can become the input for another processing task. A network of tasks help deliver the goals of the stream processing pipeline. Having discussed the characteristics of stream processing, let's now look at the opportunities and challenges of this technology in the next video.
- Streaming opportunities and challenges
- Setting up the environment
- Steaming analytics with Spark
- Monitoring alerts and thresholds with Spark
- Creating leaderboards with Spark
- Generating real-time predictions with Spark
- Hands-on Spark streaming project