Learn how to upload files to Hadoop from your local machine and how to download from remote locations.
[Instructor] Now let's take a look at how to actually upload files from our local system into HDFS. Now if you recall I have copied everything over into a new folder here called hadoop for DS and I'm just going to open that folder locally by going to applications > system tools and then file browser. Double click on hadoop for DS and we're looking at the second one. I'll open this with Gedit and just close that guy so I can see what's going on, okay. Now in order to upload a single file we're going to use the put command.
So where we're at we have this clients.csv file and we want to upload that to the clients folder that we created in the previous clip. So we'll do hadoop space FS -put. This is the command that's going to put the file up there. We're already in the directory here so I can type data and clients.csv and then I want that to go to data/clients.
I'll hit ENTER and the command completed. I'm going to maximize this window just a little bit so we don't get that extra line there when we're running it. Now to take a look at this I'll go hadoop FS -LS, so listing those files and I'm just going to look at the one in the clients folder. There you go, so you can see that the clients CSV file was uploaded successfully.
Now we can upload some more and we're going to do this with the sales files. So what I'm going to do, I'll clear my screen, and I'm going to do hadoop FS -put again. If I wanted to type out the full path before I showed you the shortcut now I'll do the tilde, which is my users home directory /hadoop for DS, which is the directory we created earlier. We have a data folder here and under there we have sales yearly and we have Cogsley Services sales data 2009.CSV and we're going to put that in the data/sales folder that we created in HDFS earlier.
Now we'll take a look at that one again by doing hadoop FS LS data and sales and you can see there it is, it's uploaded. Now if I wanted to I can download a zip file and load that into the folder directly. So I'm already in the place where I want to download that I'm going to use a command line option here called wget and I'm going to copy this from the exercise file. What this is is a command that will download this CSV file or the zip file which is a listing of all addresses in San Diego.
So if I paste that in and hit ENTER you'll see it's actually going to go out to Amazon AWS and download this data. This is from openaddresses.io where you can download all kinds of address data from all different cities in the US. Now once that's done there I can take a look LS and I can see that I have send_diego.zip. You can unzip that file and now that that's exported let's take a look at some of the lines on it using the head command and we'll look at the first 15 lines.
That's under US CA San Diego.csv and you can see that I do have a column header and you can see what the type of data is in there, the longitude, the latitude, the number, the street, the unit, all good information here that I could do things like make maps with. Now I want to make a addresses folder in HDFS and then upload this data into it. So let me clear my screen first just so we have a clean slate to work from. And I'll do hadoop FS mkdir data addresses.
Now that that's created I want to move the CSV from my local environment to the HDFS folder and I'll do this just like we did before. Hadoop FS -put give it the location of the local file, which is US CA San Diego.CSV and I'm going to put that in that new folder I created, data addresses. If I want to check that the file is there I'll do hadoop space FS space LS data addresses.
And there you have it, so that's how you can download files, unzip them, and then upload them into hadoop and make sure that everything was put in its right place.
- Working with files
- Organizing files in HDFS
- Connecting to Hadoop
- Exploring Hive through Beeline
- Accessing Hive from Python
- Creating aggregates in Hive
- Selecting partitions in Hive
- Complex data structures in Hive
- Mapping data in Hive
- Creating flat tables for Impala
- Deconstructing Impala queries