In the event you need to delete data, there are a couple ways to go about it. This will show the basic method and some things to avoid.
- [Instructor] For this tip, what we're going to look at is removing files in HTFS and in the previous one, what we did is we moved files around, so we have a few left over. Now, we could have done that using the move command, which would essentially delete it from the original location and copy it to a new one, but I wanted to separate these because I want to make sure that you understand there's some nuances to removing files when you're working with them in HTFS. So, if I want to remove a single file, I can just type hadoop fs -rm.
That's the command to remove it and give it the path, data/clients/clients.csv. This is the one, if you recall, that we had already moved over to a new directory. And I'm just running this in my terminal window here with the exercise files on the left of all the commands we're going to run through. So, I hit Enter. It was deleted. I got a good message there. Then, if I wanted to see what's in that directory, I can say hadoop fs -ls /data/clients.
And now you can see that the files are gone, the directory is empty, and then we can then remove the directory. Not that I mention, I could have just moved this over, which would have done both of these operations in one step. It would have copied it to the new directory and removed it from the existing one. So, that's the -mv command. But, since we've already done it manually, I'm not going to run that one. I'm just going to remove a directory now, but let's try to see what happens if we try to remove a directory that has data in it. Hadoop fs -rmdir, which is the command to remove the directory.
And if you recall, the sales directory which we created has something in it. So, you can see that Hadoop doesn't like that. It gives me an error. The directory is not empty. So, what we need to do is actually add the -r command to do a recursive delete. So, I'll try this again, hadoop fs -rm Re /data/sales.
Now what it did, it deleted the files within that directory and the directory itself. Now, if I want to take a look, I can go see hadoop fs ls and just data. And you can see I have a client folder and an addresses folder that we set up earlier, but the sales one is gone as well as the file that was in the sales folder. To delete an empty folder, you can just use the rm dir. So, if you wanted to try that, we can do hadoop fs makedir, just create one real quick.
Slash tmp folder. And if I wanted to try to remove that, I can do hadoop fs -rm /tmp folder And it gives me a different error saying that it's a directory, not a file. So, rm is for files, the rm dir is for removing directories, but remember, only if they're empty. Hadoop fs -rm dir /tmp folder.
Good to go. Now, if you have a lot of files that you need to delete, you can do so using a wild card command where at the end of whatever the path is that you're entering, you can add a wild card.
- Working with files
- Organizing files in HDFS
- Connecting to Hadoop
- Exploring Hive through Beeline
- Accessing Hive from Python
- Creating aggregates in Hive
- Selecting partitions in Hive
- Complex data structures in Hive
- Mapping data in Hive
- Creating flat tables for Impala
- Deconstructing Impala queries