Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
In this chapter, we will be talking about lookaround assertions. Lookaround assertions are made up of two main types: lookahead assertions, and lookbehind assertions, and those are further divided into positive, and negative. In this movie, we'll start by examining positive lookahead assertions. The metacharacters that we are going to use to define a positive lookahead assertion are the question mark, followed by the equals sign, and you will use these inside a grouped expression as the first characters inside the parentheses. A lookahead assertion is an assertion of what ought to lie ahead.
We're telling the regular expression engine, essentially, take this grouped expression, and look ahead, and see if you can find a match. If our lookahead expression fails, well then the whole match will fail. But if not, then the engine will keep going, and see if it can make a match out of everything else in the expression, and we can use any valid regular expression inside the lookahead assertion. The most important point about these assertions, though, is that they are zero-width. Remember we saw that anchors and word boundaries are zero-width matches as well? They refer to a position, rather than to an actual character.
The same is true with assertions. The assertion will return true or false about whether it makes a match, but it does not actually match any characters. That's why they're called assertions. They just assert that something ought to be true about the match, but without doing anything else. Lookahead assertions are going to be supported by most modern regular expression engines. Perl first introduced them, and since then, Perl-compatible regular expression engines typically support them. Engines developed prior to Perl, like the UNIX tools developed during the 1970s though, do not.
So as I said at the start, we define a lookahead assertion by using a question mark, and an equals, inside a grouped expression. That's very similar to what we used for a non-capturing group, but with an equal sign instead of a colon. And just like the non-capturing group, the purpose of the question mark is to indicate that the group has a different meaning than normal. Then it's the second character that comes after the question mark that defines what kind of special meaning it will have. The equal sign is what defines that it will be a lookahead assertion. The equals sign is easy to remember, and makes sense as being positive, because what we are saying is that our expression should be equal to something.
Be careful not to put a space after the equals sign, though. It may seem more readable, but that space has meaning, and becomes part of the regular expression. It should just be question mark, equal sign, and then immediately the regular expression you want to assert. Let's look at some examples. So let's say that we have an assertion that we should have a match for seashore -- that's inside our group -- and then after that, we have the literal characters S, E, A. So if we have the string seashore, then what happens is the first thing is the regular expression engine says, alright, let me look ahead; I should find a match for S, E, A, S, H, O, R, E.
Great! It does have that match. It passes the assertion. So since the assertion passes, now it proceeds to the second part of the expression; the non-grouped part, which is just the characters S, E, A. It does match those, and that's the part that matches; not the whole word, not seashore, just S, E, A. If we tried the same thing with seaside, the assertion would start running, and it would say, I am looking for this pattern. It looks for seashore, it says nope, it fails, and it just stops right there, and says not a match, and never attempts the second part of the expression.
Now, you may notice that I am repeating S, E, A there in both of those. You can actually write this same expression this way as well; with the S, E, A first, and then our assertion this time is not for seashore, but just for shore. What we are essentially saying is, if you find a match for S, E, A, now look ahead at what comes next, and see if you have a match for shore. See if that's what follows. Both of these would return the exact same thing: a match on just the characters S, E, A, and only when it's inside seashore, not seaside.
We'll talk about why you might choose one example over the other in the next movie. But for now, let's try some examples, so you can get the hang of it. To start out with, let's put in our seashore and seaside here, and then for our regular expression, let's first put in our first one. If you just do S, E, A, we see what it matches there, and it matches it in both of them. What we want to do now is put that lookahead assertion, and say, only match it if you can first much seashore. So you can see that it does that, and it only finds it in the first word, not in the second word.
And as I said, that's the same thing; I'll just cut, and paste that at the end. Take out the sea. It's the exact same thing as if we do it that way. Now, when I contrast that, though, against using a non-capturing expression -- and I want you to see the difference there; It's not the same thing. we are talking about what gets matched, not what gets captured. I want to make sure you understand that difference. Before, we were talking about capturing, so don't mistake the two. See, here the match is actually for seashore, and we've told it that shore is a non-capturing group.
Here we've said, don't match shore at all; just look ahead, and see if it's there. It's an important difference. It's also the same thing; don't mistakenly think that this somehow is equivalent. Here we're capturing sea. We want to capture that for use later in a Find and Replace, or something like that, but we are still matching the entire word: seashore. Let's try a more complex example. I am going to open up this text here. This is Ralph Waldo Emerson's Self- Reliance; just a text I can copy. This is in the exercise files, and then let's just paste it in here. And what I want to do is I want to find all words that are followed by a comma.
So we know how to do the basics of that. Let's say we are going to find slash b, for our word boundary. We are going to need a word boundary at the end, and in between those, we are going to have the characters A to Z, a to z, and let's put in apostrophe as well, so you get words that have apostrophes, and repeated, so there we are. Now we've got all the words. What we want is the words that end in a comma. Well, you could put a comma at the end of this, and find those words, but we've matched both the word, and the comma. What we want instead is to use a lookahead assertion to say, look ahead for that comma, but don't actually match it.
Do you see the difference there? Try it out a few times back and forth if you need to, 'til you get the hang of it. The difference is that we are asserting that the comma ought to be present, but we are not making it part of our match. And again, you can compare that against the non-capturing parentheses, and see that that does include it in the match. What we are interested is in using the assertion to not match it. So, hopefully you're starting to see what a powerful tool lookaround assertions can be. They allow you to look around the area that you're matching to satisfy certain conditions, and that's a powerful tool.
We'll learn about another powerful way that we can make use of them in the next movie.
Get unlimited access to all courses for just $25/month.Become a member
61 Video lessons · 100058 Viewers
56 Video lessons · 113115 Viewers
71 Video lessons · 81971 Viewers
131 Video lessons · 39327 Viewers
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.