From the course: Data Science Tools of the Trade: First Steps

Unlock the full course today

Join today to access over 22,700 courses taught by industry experts or purchase this course individually.

Fundamentals

Fundamentals

From the course: Data Science Tools of the Trade: First Steps

Start my 1-month free trial

Fundamentals

- Hadoop is an innovative open-source software solution that allows you to create a computer cluster quickly. The two main components of Hadoop are MapReduce and HDFS. And they are what makes Hadoop a distributed computing platform. HDFS stands for Hadoop Distributed File System. Beyond this, there are also many tools building on MapReduce and HDFS to provide additional and specialized features. MapReduce divides up a job to multiple computers and uses task trackers to ensure that processing can be done on different servers. HDFS splits up a big dataset into smaller, more manageable, file sizes and stores them on multiple computers. The main computer is called a master and the other computers are called slaves. Let's look at how Hadoop works. A typical scenario involves applications posting jobs to a queue, monitored by a Hadoop master, which in turn, processes them one-by-one in the sequence of their arrival. This type of handling tasks is called batch processing. Fault tolerance is…

Contents