From the course: Big Data Analytics with Hadoop and Apache Spark
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Integrating Hadoop and Spark
From the course: Big Data Analytics with Hadoop and Apache Spark
Integrating Hadoop and Spark
- In this video, I will review the benefits of using Hadoop and Spark together for big data analytics. Why is the combination of Hadoop and Spark so powerful? HDFS provides large-scale distributed data storage. Spark provides large-scale fast processing of the same data. Together, they make an excellent combination for building data pipelines. Spark is well integrated with Hadoop natively and makes optimal use of that integration. For example, Spark can access and update HDFS data using multiple parallel nodes. There are a number of data read optimizations that use less memory and I/O. Spark can use HDFS for intermediate data caching. Also, YARN provides a single cluster management mechanism for both HDFS and Spark. So, my recommendation, especially for enterprise deployments, is to utilize the processing power of Spark with the scalable storage of HDFS to build high performance processing jobs. In this course, I will…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.