Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
In this movie, we're going to take a look at another simple but powerful Unix tool called TR. TR is short for Translate and what it does is it copies from the input it receives to output, but with the substitution of selected characters, according to translation rules that we give it. So at its simplest we would have echo a, b, c, and we'll pipe that in to tr, and the first argument of tr is going to be the thing we want to search for. Our search string. So we're going to search for all of the commas, and the second argument will be the replacement string.
So we'll replace it all with dashes. So as you might guess, it takes all the commas and it translates them into being dashes. Now this is a totally valid use. But translating one character to one character doesn't really reveal how TR works. So let's try another better example. Let's say we have echo and for this string let's use the numbers 1 to 6, so you can just peck away through your keyboard, just pick a random set of numbers, but not going any higher than 6. And then we're going to pipe that into tr, and this time we're going to say that our search string is the numbers in sequence 1 to 6.
1, 2, 3, 4, 5, 6, and the replacement string that we'll use for that, I am going to use the letters E, B, G, D, A, E. Those map to the tunings on a six string guitar. When we hit Return, now notice what it did. It took all of the 1s and it turned them into Es and it took all of the 2s and it turned them into Bs and all the 3s became Gs and so on. Everything that it found in the search string, not only did it find it, but then it said "Oh, what position are you in the search string? Let me map that to the replacement string and find the item that's in the same position there." It's a mapping, right? So the number 6 gets mapped to E, number 4 gets mapped to D, right? Do you see how that works? Notice also that if we had another number, let's put in a 9 or any other character really, it wouldn't get matched, right? It doesn't get translated, because it's not in our translation string.
Therefore it gets left alone. It's only those selected characters and those selected characters get translated based on their position in the search string and in the replacement string, so position does matter. This translation is a lot like the code breaking that you might have done as a kid. In order to find out a secret message, you'd have to swap all the As for another letter and all the Bs for another letter. When you were finally done, you would have decoded the message by doing this simple alphabetic translation. We can actually do this with TR. A simple version would be to encode a string using something called ROT-13.
ROT-13 is just rotating all the characters 13 places. So the A becomes N, B becomes O, C becomes P, and so on. Let's try it. echo 'This is ROT-13 encrypted.' A single quote and we'll pipe that into tr and now for tr's search string, we're actually going to use character sets. We could write out A, B, C, D, and all that, but we can shorthand that with A-Z, all the letters A to Z in that order, followed by the lowercase letters, a-z in that order, because it is case-sensitive.
Then in the replacement string, what we want to do is rotate all of those 13 positions. So N-Z would be take care of letters A through M, and then A-M capital M would be the second half of that. Same thing for the lowercase letters. n-z, a-m, and we translate it. Now what we get is something that is ROT -13 encoded. And we can just take that and we can decode it by just rotating it 13 places again. Since there are 26 letters in the alphabet we can do that, and now we can decode the message just as well.
So that's probably the simplest kind of encoding and encryption there is. It is just that simple rotation cipher. But hopefully now you start to see the idea of how translate does its thing. Just to make it really clear, let me show you what it's not for. If we have for example something like already daytime and we want to swap out tr 'day' for 'night', it doesn't replace the word day with the word night. That's not what it's doing. What it's doing is it translates the d into n and the a into I and the y into g, right? Do you see that? So already also got affected by that.
We're translating each of those characters that we find, not doing a Find & Replace like you might do with the word processor. Now we don't have to provide a simple one-to-one replacement. For example, let me just paste in an example here. I have a long string and then have as my search string bedf and then the numbers 5 through 9 in that order. I'm replacing them all with x, just by itself. See what it did? It found those specific items and replaced them all with x. To see what it's actually doing, a good way to do that is just let's put another character there. Let's say z. Now notice that the b got replaced with the x, but everything else got replaced by the z. That's because it's repeating the z.
It's the same thing as if I've done something like that, right. Whatever the last item is, it just gets repeated multiple times until we have enough characters to match our search string. So if the replacement set is smaller, then the last item repeats. If the replacement set were bigger than the search set, well then those items will just never get reached. There would be nothing that ever would map to those remaining characters. So now that you understand what it is and how it works, what are some real-world use cases where you would use this? Well, one example might be, if you remember we have a file here called people.txt.
I am in unix_files folder already. It just has names of people. Well, what if I wanted to make all of those lowercase? We could do that. Let's say tr and we'll do capital A-Z should be translated into a-z and we would not say people.txt, right? There are only two arguments allowed. Let's try that. You'll see that it comes up and says no, I only want two arguments. Redirecting our input from the file, and remember how we did that. So we're now taking the contents of the file and passing those in as input to tr and then it gives us the output.
It now made those all lowercase. You also can use those same regular expression classes that we just saw in the regular expression movie. upper and then here would be lower, right? Does the exact same thing. So we can use upper and lower because those are well ordered sets. Using some of those other classes like punctuation, they would work but the order might not be obvious to you, what order those characters come in.
Another example where it might be useful is let's imagine that we have something that's in a foreign language, right, and has lots of accented characters in it. We need to get all those accents out of there. For whatever reason, we aren't going to be able to print those accents so we need to just take all of the accented Es and turn them into regular Es. That will do it for us. It will strip them out and turn them into its equivalent. Another really useful case is let's imagine we have a file. I have a file here, a new file I've added. It's us_presidents.csv. CSV is for Comma Separated Values. If we take a look at the head of that file, us_presidents, you'll see that I've got values here that are separated by commas. so George Washington, 1789, and so on.
That's comma-separated Values. It might actually be a little easier to understand the data by looking at it in a table format. So I've essentially got the number of the President, their name, the year they started, and finished their term, their party, their state, and then where their Wikipedia entry is. So that's what I've got. I've just got all these values not in a table, but separated by comments, right, and then I could import them into things and that kind of thing. Well, in addition to comma-separated values, a very common format is to have tab- separated values, and you put tabs in between each of them. So using tr, that's a cinch, right? We just say tr, take all of the commas and let's replace those with tabs.
Tabs, the special character is this /t, and so we'll pipe in our us_presidents.csv and you can see the difference. Now you see those tabs and we can just take that same thing and let's output it now to us_presidents, and we'll do TSV for tab-separated values. So there we go. Now I have my comma-separated values and I have a tab-separated version with just one simple command. So it's the basics if I had to work with tr. There are a couple of options to give you some nice features. So let's look at those in the next movie.
Get unlimited access to all courses for just $25/month.Become a member
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.