Learn about some key design issues and recommendations for the individual components within the social media sentiment analysis use case.
- [Instructor] Let us now deep dive…into some key design considerations for this architecture.…Let us start with designing Apache Spark components…for real-time processing.…Each post received in the queue is independent…of other posts.…Also, sequencing of posts is not important in this use case.…This means each each post can be cleansed…and processed independently in a separate Spark partition.…
This can be done so by doing it within a map function.…The call to the sentiment analysis engine…can also be done within the map function.…There will be as many parallel threads…as the number of Spark partitions.…The sentiment of the tweets can be summarized…through a reduce function for every micro batch.…Given that the allowed latency…or response time is in minutes,…it is okay to have a batch size of a minute or so…to keep the resource utilization optimal.…
For 100K per day load,…very small batch sizes might not be optimal,…as each batch will have one or no posts in them to process.…Keep the number of Spark partitions equal…
There is no coding involved. Instead you will see how big data tools can help solve some of the most complex challenges for businesses that generate, store, and analyze large amounts of data. The use cases are drawn from a variety of industries, including ecommerce and IT. Instructor Kumaran Ponnambalam shows how to analyze a problem, draw an architectural outline, choose the right technologies, and finalize the solution. After each use case, he reviews related best practices for real-time streaming, predictive analytics, parallel processing, and pipeline management. Each lesson is rich in practical techniques and insights from a developer who has experienced the benefits and shortcomings of these technologies firsthand.
- Components of a big data application
- Big data app development strategies
- Use cases: fraud detection and product recommendations
- Technology options
- Designing solutions
- Best practices