Viewers: in countries Watching now:
Learn how to find and manipulate text quickly and easily using regular expressions. Author Kevin Skoglund covers the basic syntax of regular expressions, shows how to create flexible matching patterns, and demonstrates how the regular expression engine parses text to find matches. The course also covers referring back to previous matches with backreferences and creating complex matching patterns with lookaround assertions, and explores the most common applications of regular expressions.
So far, we've been talking about positive lookahead assertions, but they have an opposite, which are negative lookahead assertions, and the metacharacters we use for negative lookahead assertions are the question mark, followed by the exclamation point. The way we use them is the same way that we use the positive ones. Inside a grouped expression, the very first character inside the parentheses should be a question mark, then an exclamation point, and then our regular expression. The question mark indicates that this grouped expression has a different meaning than normal, and the exclamation point means that, that meaning is negative, or not equal. The exclamation point makes sense, because in many programming languages, exclamation point is used to mean not.
So if we had an exclamation point, followed by an equal sign, it would be used to mean not equal. Well here, we are just using the exclamation point by itself to mean negative; not this regular expression. Be careful that there is no space after the exclamation point. You just start your regular expression right away. If you put in a space, it becomes part of the expression, and gives it meaning. To see this in context, it works the same way as our positive ones did, except it returns the opposite set of results. So for example, if we had negative lookahead for seashore, followed by sea, that would match sea inside seaside, but not in seashore.
It still matches the same thing; S, E, A, but it's looking ahead, and looking for the opposite case; the case when seashore is not what it finds. If you step through it the way that the regular expression engine does, it first checks to see, do I find seashore? And if it does find it, it says, nope, it's a failure then, it's not a match, and we move on. If it doesn't find seashore, then it rewinds and checks for S, E, A, and makes the match. And of course, we can write that in the reverse by putting sea, and then have a negative lookahead assertion for shore after it.
So match sea, then look ahead, and see if the next part is shore. Now, it's easy to overlook the power that these give you, because you may think, oh, well they're just the opposite of positive lookahead assertions. But, here's the thing. regular expressions so far have not had a way for us to talk about when not to match something. We had negative character sets, which allowed us to rule out characters, but there's no other way to talk about when to not match an entire pattern. Negative lookaheads give us a way to describe expressions that should be rejected.
That's an important and powerful tool. So, for example, let's say we wanted to find the word online. We wanted to find it only in cases when it wasn't followed by training. So we could have an expression that was online, and then a negative lookahead for space training. That would not match online training, but it would match online courses. Even more powerful, we can use wildcards to say that there might be words in between. So we want to find online anytime that it's not followed by training, even though there might be words like video in between. So we'd find online videos, and online courses, but not online video training.
Let's try some more examples. Let's start with this sentence. I am going to paste in, The black dog followed the black car into the black night. So you see we've got the word black three times, and we can write a regular expression that will match the word black using the word boundaries on either side to make sure we get just that word, and it matched it three times. Well, we can use our negative lookahead assertion like that, with a regex in the middle, and say we want to find it anytime it is not black followed by dog. So now we found the other two, but we've ruled out the one.
Of course, that's the opposite of if we had the equal sign there, which would find black when it is followed by dog, but not any other time. Let's try the phone number example that we had before. We had three different phone numbers, and we wrote ourselves a regular expression that would check it three different times. It checks to make sure that the digits are zero to five, it checks to make sure that it has four, three, two, one in it, and it makes sure that it matches the format of the phone number. We also need to turn on multi-line anchors here, because we put the phone numbers on separate lines, and we're using anchors around this expression.
So what if we wanted to change this one to a not? Now, it finds things that are zero to five, but that do not have the sequence four, three, two, one in it, and that still match the pattern. We've used a combination of a positive lookahead, and a negative lookahead. The negative lookahead's purpose is really to rule out a case. To say, I want to find everything, but I know something about a special case that I want to rule out here, and make sure that we don't include in the match. That's simple, but it's actually very powerful. Let's try our Self-Reliance example, with the words followed by commas.
I am just going to copy that Self- Reliance text again. Let's paste this in here, and the regex that we had written before was like this, and this finds all words; look ahead positively for words that have a comma after them. If we wanted to find words that do not have a comma, well then we just change it like this. Now, it finds all those other words; words that do not have a comma after them. We could go ahead and put it inside a character set, and add period in there as well. Now it's words that are not followed by a comma or a period. So I think you get the idea. I want to show you one last way that this can be really useful.
I am going to go back to my black dog text here, and let's put in our text again, black, that we are going to match. What if I wanted to find the last occurrence of the word black? It's not about the specific text that immediately follows it; I want the last one. That can be a very useful and powerful tool. Well, if you think about it logically, the last occurrence of the word black is anytime that the word black isn't followed by itself. So all we have to do is write a negative lookahead expression, and say that any number of characters -- I don't know how many -- followed by the word black; we'll put those word boundaries again.
Now, it says, okay, I found black that doesn't have black coming after it. It's the last occurrence, in that line, of black. Even better, you can put our grouped expression here, and we'll capture that group, and then we can just use backslash one to refer to that captured group. Remember how we did that when we learned about backreferences? So now it finds the word black, and any time that it's not followed by itself, it's the last occurrence. This is a very useful pattern, and one that's definitely worth remembering. Hopefully now you appreciate some of the power that lookaround assertions give you.
But we've only looked at one half of lookaround assertions: the lookahead assertions. That's looking forward. We also have the ability to look backwards by using lookbehind assertions, and that's what we'll look at in the next movie.
There are currently no FAQs about Using Regular Expressions.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.