Spark Streaming opens up a lot of opportunities in real-time processing, but also comes with some challenges. In this video, learn the key opportunities and challenges that stream processing brings to big data.
- [Instructor] Stream Processing provides a number of new opportunities for real time insights, but also processes challenges. Let's review them in this video. What are some of the key opportunities provided by Stream Processing? First, it provides the ability to process big data in real time. Using parallel processing capabilities, significant volumes of data can be processed and delivered. Streaming allows the ability to do data marshaling in real time. This involves analyzing incoming data, and making decisions on where to direct the data based on specific use cases and scenarios. It provides the ability to do real time analytics. Data can be analyzed to generate insights in real time, which in turn can drive real time actions. Data can be checked against set thresholds in real time, and alerts can be generated. This can provide critical functionality for real time resiliency. Leaderboards can be maintained in real time to show Top trending elements. It has significant users in gaming and operational dashboards. Finally, predictions can be made on incoming data using machine learning models. And these predictions can be delivered in real time to destinations, to drive actions. But what are some of the challenges of real time Stream Processing? The first challenge is the unbounded memory requirements needed to handle unbounded data. It is not easy to predict and control memory requirements of upcoming data. Horizontal scaling of the streaming pipelines as incoming data grows and fluctuates is also a challenge. The pipeline should be scaled up and down as the incoming data volumes change. A lot of analytics need to look back beyond the current record. Periodic summaries like five second summaries would need to look back records window them based on timestamps, and then aggregate them. State management is another key challenge, especially in distributed processing. How do we maintain state by entity across a distributed processing network and stored them and access them in real time? Finally, while Stream Processing allows for Ad hoc analytics, optimizing Ad hoc queries in real time is also a challenge. Fortunately, today's Stream Processing frameworks provided by Apache Spark, Apache Flink, and Kafka streams solve these problems for us. They provide out of the box capabilities that helps manage these challenges and delivers string processing.
- Streaming opportunities and challenges
- Setting up the environment
- Steaming analytics with Spark
- Monitoring alerts and thresholds with Spark
- Creating leaderboards with Spark
- Generating real-time predictions with Spark
- Hands-on Spark streaming project