Discuss some best practices while building data transport modules within the big data architecture.
- [Instructor] When we build big data pipelines,…it is not just enough to build them…for efficient data processing.…We need to also enable your operations folks…to easily manage the pipeline.…We will see some best practices…for pipeline management in this video.…So, what are the best practices for pipeline management?…Your pipeline typically has clusters,…Kafka clusters, Spark clusters,…database clusters, and also custom service clusters.…
It is a good idea to use best of breed cluster managers…like YARN and Mesos to manage the clusters.…These cluster managers should provide key capabilities…like monitoring cluster health, scheduling of jobs,…monitoring of jobs, failover management within the cluster,…scaling with additional nodes,…and reporting of cluster health.…Another key operational issue…with asynchronous biplanes is the backlog that builds up…in the various queues.…
It is a good idea to have a monitoring system…to keep track of backlog…and keep them at manageable levels.…An ever-increasing backlog means you will never catch up,…
There is no coding involved. Instead you will see how big data tools can help solve some of the most complex challenges for businesses that generate, store, and analyze large amounts of data. The use cases are drawn from a variety of industries, including ecommerce and IT. Instructor Kumaran Ponnambalam shows how to analyze a problem, draw an architectural outline, choose the right technologies, and finalize the solution. After each use case, he reviews related best practices for real-time streaming, predictive analytics, parallel processing, and pipeline management. Each lesson is rich in practical techniques and insights from a developer who has experienced the benefits and shortcomings of these technologies firsthand.
- Components of a big data application
- Big data app development strategies
- Use cases: fraud detection and product recommendations
- Technology options
- Designing solutions
- Best practices