From the course: Big Data Analytics with Hadoop and Apache Spark
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Apache Hadoop overview
From the course: Big Data Analytics with Hadoop and Apache Spark
Apache Hadoop overview
- [Instructor] In this video, I will review the key features and the current state of technology for Apache Hadoop. Hadoop is an open-source technology that started the big data wave. It provides distributed data storage and computing using low-cost hardware. It can scale to petabytes of data and can run on clusters with hundreds of nodes. Hadoop mainly consists of two components, the Hadoop Distributed File System, or HDFS, that provides data storage. MapReduce is a programming model and implementation that provides distributed computing capabilities with data stored in HDFS. Where does Hadoop stand today? Let's look at HDFS and MapReduce separately. HDFS is still a very good option for cheap storage of large quantities of data. It provides scaling, security, and cost benefits that help in its adoption. It is most suitable for enterprises with in-house data centers who want to host the data within their network.…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.