Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
In the last movie, we got familiar with the syntax of sed, but all of our searching so far has been with literal text strings. Now we're going to learn to use regular expressions with sed. It may seem like sed is really similar to grep. That's because it is. All sed is, is grep and then substitute. So put another way, anything that you can find with grep, you can change with sed, and that includes making good use of regular expressions. So as a simple example of this, let's just have echo "Who needs vowels?" and we'll pipe that into a sed expression where we will look for anything that is in a, e, i, o, or u, inside a character set and we'll replace that with an underscore. We'll do it globally. So there you go.
You see it took all of the vowels that were in that character set and replaced them with the underscore. So we can use regular expressions. Now, the regular expressions here work exactly like they do with grep, meaning that we also have an issue with basic versus extended regular expressions. So for example, if I put the plus sign in there, it doesn't work anymore, because the plus is part of the extended regular expression set and just like grep, we would use the -E option to be able to use those extended features. So let's try a couple.
Let's say we have for example our fruit file. That's just cat fruit.txt and in that let's writes a sed expression that will take the first line that starts with p, any line that begins with p, and we're going to replace that with space, space, p. Now, notice I had to repeat the p again. So I'm finding it. It's going to be part of what gets replaced. So I want to make sure that I still include it when I finally replace it. So there we go, and we'll just do our fruit.txt file for that.
So everything that had a p got indented two spaces. We could also leave out the p and just indent everything two spaces. A variation on that would be to, instead of having two spaces, put the right angle quote in there. That does the same thing as when we quote a mail message, right? If we reply to a mail message, our mail editor might stick those in front of the reply that we're doing. One important thing that you might run into is you might think, well, instead of spaces in front of every line, what if I wanted to put a tab? And we have this shortcut for the tab character, which is the \t. That doesn't work here.
sed doesn't understand that \t, or at least the Mac version of sed doesn't. There are other versions that do, but the Mac version doesn't understand it. So in order to get it, the trick that you need to know is that in bash if we want to type a tab character, the special tab character, the way to do it is to type Ctrl+V and then the actual character. So hit Ctrl+V and then tab, and then we'll actually get that tab effect. Ctrl+V works for other characters as well. Ctrl+V and Enter, Ctrl+V and Escape.
It will type the actual character for you. You won't need it most of the time, but this is one of those cases where it definitely comes in handy. Let me give you one last example in this before we move on. Inside the directory I am in I've added a new file called homepage.html. What that is, is just a basic homepage for a fake company. So you can use any HTML you have. I just wanted to have some HTML to work with. Let's construct a sed script that will remove all of the tags from this, all the HTML tags, so that what we're left with is just text.
Well, the way we could that is with sed, we'll use -E, capital E, so we can make use of the extended sed substitute. And we know we're going to want to find everything that's inside those tags, the angle brackets. So I'll just do that for now. And then we're going to remove them globally inside homepage.html. So now we just need to write a bit more of our regular expression. Inside those angle brackets, what are we going to have? What's allowed to be in there? Well, you could say a lot of things. I'm going to say that the thing that defines it is that it is not those angle brackets.
It could be any other character besides those, and I'm not going to be picky. And then we'll put our plus sign after it to show that there can be more than one of those. So that then takes our HTML and strips out those tags. You certainly can come up with better regular expressions certainly using more advanced techniques. You can see, for example, it didn't filter out this first tag because it's broken across two different lines. It's not perfect, but you do get the idea of what it can do for you. Now let's talk about back references. Back references are actually part of regular expressions and sed makes good use of them. Let me show you a good example.
Let's say we have echo 'daytime' and we want to change that using sed, and what we want to make is daytime, it's going to be made into daylight. We might be looking for things that are much more complicated. We might be looking for not the literal daytime, right? We might be constructing some very fancy regular expression here. We might be saying, well, look for anything that is ... time, we don't care what it is. And what we want to do is take that thing, whatever that thing was, and use it again.
We don't know what it is ahead of time. It's not necessarily day. It might be some other three letters. Well, the way that we do that is with a back reference and a back reference is the backslash and then the number of the back reference. So we can have more than one of these defined in our search string. In fact, I believe it supports up to nine, so you can have up to nine of these and they will then say "Ah, the first set of parentheses, well, that corresponds to \1. The second set of parenthesis, that corresponds to \2, and so on." Now, if we try and run this, it doesn't work, and that's because these parentheses here have to either be escaped to work with basic regular expressions or if we don't escape them, we have to say this is an extended regular expression.
So anytime we use those parenthesis in there, it has to either be extended or they have to be escaped. And just to make sure that you understand the difference here, let's say instead of day, let's say that it said something like, we'll put xxx for now. So you can see that it took those same three characters that it found. It didn't care whether they were day or something else. It took those and dropped them into the replacement string. Let me give you a real world example. I think that will make it clear why this is really useful. Let's say that I have a name like Dan Stevens.
I can pass that into a sed script that will say all right, take any characters that could be in the first name, followed by a space, followed by any characters, and then reverse them with a comma between them. Look at that. Dan Stevens suddenly become Stevens, Dan. So you can see how you can do this, not just to this little bit of input that I'm sending it, but you could do it to an entire file. Let's try something similar by using our fruit file. So I have a sed expression here that's going to look for either apple, pear, plum, or peach, and for any of those it will append tree after it.
So we get pear tree, then we get raspberry banana, then we get peach tree, apple tree, pineapple tree. So any of those that matched our regular expression got reused in our output. Now, there is a lot more that sed can do, but this shows you some of its most common uses, and I think it gives you a solid foundation for exploring further on your own.
Get unlimited access to all courses for just $25/month.Become a member
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.