From the course: Cloud Hadoop: Scaling Apache Spark

Add Hadoop libraries - Apache Spark Tutorial

From the course: Cloud Hadoop: Scaling Apache Spark

Start my 1-month free trial

Add Hadoop libraries

- [Narrator] As we continue on setting up our edupe workspace, I want to add a common use case. In addition to the libraries that are supplied by default with the cluster, here, things obviously like map produce and hive and spark, and the typical edupe libraries. You have the capability to add libraries and this is great for trying out different scenarios. So I'm just going to add, as a demonstration, this Avro library. And if you're either not familiar, or don't remember, this is used in data loads. It's a serialization system, so it gives a compact fast binary format and it allows splitting and compression. Which, if you have variable input sizes, or big input file sizes, it can help to optimize the input in pipelining. So, this is just many types of libraries that you might use on the front, on the data load, you might use it during the processing, but it's more, I want to show you, the process of adding an external library to your cluster in this particular configuration. So we're doing this before we spin up the cluster because we want the library to be available because you'll remember that the clusters come and go. So it has a storage space for libraries. So I'll show you how this works. I'll go into our workspace. And I'll go into shared. And I'll go into create and library. Now the interface here is a little bit un-intuitive so let me take a minute to go through this. If you have a local JAR file, you could just upload it from here, that's kind of obvious. But what we want to do is, we want to pull in this file that's hosted in a public repository. So that's Maven. So inside of here, we want to pull this in, and so I'm going to just put in Avro. And then I'm going to go to search spark packages and maven central. You have to be a little bit quick here, so I'm going to go ahead and switch this to Maven, and then I want to put in the avro tools underscore one point, there it is, one point eight point one. So that's the latest one, and here you can select different releases. And then I'm going to click select, and then I'm going to create the library. So what this is doing, is this is going out to the public Maven repository and this is pulling down the version of the Avro tools library that you can then make available to your particular cluster. And I'll show you how to do that in a subsequent movie when we spin up our cluster. And so there is our library, and it's available for us.

Contents