From the course: Cloud Hadoop: Scaling Apache Spark

Apache or commercial Hadoop distros - Apache Spark Tutorial

From the course: Cloud Hadoop: Scaling Apache Spark

Start my 1-month free trial

Apache or commercial Hadoop distros

- [Instructor] As we're thinking about Hadoop clusters, of course, open source is where this all started. Most of my customers, though, other than startups, really don't use open source because they need tooling, they need management around it. And frankly, although the underlying structure changes, not a lot has been happening since I recorded my Hadoop Fundamentals in comparison to the amount of change in the commercial and public cloud distributions. Now, for commercial distributions, of course, Doug Cutting, who created Hadoop, and his team are at Cloudera, and they still are a market leader in many ways. I find they're used more often on premise, although obviously could be deployed into the cloud. The new player, kind of the shining star, is Databricks, which has created the Apache Spark libraries and has a commercial version. We'll be using that in this course 'cause that's what I'm using more and more with my customers. Now that actually sits on top of the Amazon cloud currently, so it spins up virtual machines on Amazon. But we'll get into that as we get into the demos. A new aspect of Hadoop distributions is that, in addition to the managed distribution that's been on the Amazon cloud for several years, EMR, or Elastic MapReduce, we now have some competition from Azure with HDInsight and most notably from Google with the Google Cloud Dataproc. We're going to take a look actually at Google Cloud Dataproc because I think it's an exciting new player, and it's been out less than, I think, six months in general availability at the time of this recording. But it is a really fast, really performant and has a great value. Of course, EMR is the market leader because it's been out for the longest. But I wanted to include a look at GCP, Google Cloud Dataproc, because it's really interesting in terms of speed and value.

Contents