Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
In the previous movie we saw how to use tr by looking at its standard translate usage. The tr has several options that we can specify that will allow us to work with it in other ways. Instead of translating, it can also allow you to delete or filter out characters into dedupe repeating characters, a process that it refers to as squeezing. The three options we are going to be looking out are the -d, -s and -c options. -d will delete characters in the set that we specify. So it'd just simply remove them. Instead of translating them that will just get removed. The -s option will squeeze repeats that are in the listed set.
So it won't necessarily squeeze everything, but the things that we've told it to, it will compress down. If we had for example five Xs in a row, it would turn it into one X instead, and then the -c option says Use the complementary set. It's the opposite. So you would use this option either with either the -d or the -s option. With the -d option I mean delete characters that are not in the listed set, right. It's the opposite of that, the reverse. It will become clear once we actually look at some examples. So let's say that I have a simple string. I have "abc1233deee567f".
The threes and the Es are repeated. I am going to take that string. I am going to pipe it into tr, I am going to use the -d option, and I am going to say the set that I want to specify, the thing I'm looking for, is this class called digit. So what I am doing is deleting all of the digits and you can see the output there to the far right. It would then just give me the letters. If use the -c that's the complementary set. So now I'm saying delete everything that is not a digit. So what I'm left with is just the digits. Now that's not just stripping out the letters. That's also stripping out tabs, line returns, anything else that might be in that file at all, anything that's not a digit.
So a lot of times it's easier to specify not the thing you're looking for, the thing that you don't want, right. We are filtering out the opposite version. It can be very handy. Notice that in both of those cases there is only one argument tr now. Before we had to. We were saying translate from a to b; now we are just saying we want to delete from set a. We don't need to translate to the other set. So there is no reason to have a second argument. Squeeze works the same way. We just have one argument. This time I have tr -s digit and that squeezes all the digits. So you notice in the result there to the far right that now the threes have just become a single 3. It's just 1, 2, 3.
If we use the complementary set, well then it's the opposite. Squeeze everything that's not a digit, so that squeezes the es. So now there's only one e. There are still two 3s in there. It's 1233. It's the es that got compressed. Now again that's the complementary set, so everything that's not a digit. That would be line returns, that would be tabs, anything else that's in that file that's not a digit would get squeezed. We can use these two options together. So for example, if I have tr -ds, the first argument you are going to specify is going to be the thing you want to delete.
The second argument is going to be the thing that you want to squeeze. So in the first example I have there, it's going to delete all the digits and then it's going to squeeze all of the letters, and then we come up with abcdef. Now if we use the -c option with that as well, it's important to note that the -c option only applies to the delete. So what the example I have there will do is it will delete everything that's not a digit and then squeeze the digits. It's not squeezing the opposite of the digits, right. -c just applies to the first argument to the -d. So that's how it works, but you may be thinking well why in the world would I need something like this? I mean if you have something like a file that has an essay in it, why would you want to transform your letters so that the word sweet became swet? That's not that useful.
Well, admittedly you won't use it that often, but it is a good tool to have it in your toolbox for a couple of special cases. Let's say for example that you want to remove certain characters from the file. We can remove all no-nprintable characters from the file for example, right. So remove everything that's not a printable character out of the file and then presumably I'd be able to print the file, right? You could also remove other things. You can remove all of the tabs from a file. You can remove all the line returns from a file. You can remove all the punctuation, right. Any of those things are possibilities, but essentially we are saying go through the file and remove the things that I don't want in there.
One common use case that I want to give you just so that you have it and can refer to it is that you can remove the surplus carriage returns and end of file characters that Windows files often have. For the end of line in a Windows file it has both a carriage return and the line feed, but on Mac and Unix, that's just a line feed character. So what we want to do is get rid of extra carriage return character and then Windows must also have an end of file character to indicate the end of the file, which Unix sometimes doesn't like as well. So we can strip those out and essentially take something that was a Windows file and make it a happy Unix file by stripping out the hex codes that I have there. That /015/032? There is no way you would know this.
You would have to look those up to know what they are, but I wanted to give them to you because this is a handy tool to have in your toolbox. And an example where you might you squeeze? Well, you might want to remove all the double spaces from your file, right. So every time you hit a period, if you accidentally hit the spacebar twice, you really wish you'd only done it once, well, we can squeeze all those spaces out and not just double spaces, but if you accidentally hit triple spaces or however many more there was, this would squeeze them all down, so that they would only be one space. There are all sorts of variations on these. There are ways that you can combine these. You could for example remove all characters that are not printable and squeeze the spaces in the file, do those together.
The main thing I want you to see is that tr doesn't just translate. It also has these other special features which almost could be programs on their own, but they're really just options to tr. So tr -d and tr -s. So make sure that you have tr -d and tr -s in your toolbox.
Get unlimited access to all courses for just $25/month.Become a member
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.