From the course: Big Data Analytics with Hadoop and Apache Spark

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Integrating Hadoop and Spark

Integrating Hadoop and Spark

From the course: Big Data Analytics with Hadoop and Apache Spark

Start my 1-month free trial

Integrating Hadoop and Spark

- In this video, I will review the benefits of using Hadoop and Spark together for big data analytics. Why is the combination of Hadoop and Spark so powerful? HDFS provides large-scale distributed data storage. Spark provides large-scale fast processing of the same data. Together, they make an excellent combination for building data pipelines. Spark is well integrated with Hadoop natively and makes optimal use of that integration. For example, Spark can access and update HDFS data using multiple parallel nodes. There are a number of data read optimizations that use less memory and I/O. Spark can use HDFS for intermediate data caching. Also, YARN provides a single cluster management mechanism for both HDFS and Spark. So, my recommendation, especially for enterprise deployments, is to utilize the processing power of Spark with the scalable storage of HDFS to build high performance processing jobs. In this course, I will…

Contents