Viewers: in countries Watching now:
Learn how to find and manipulate text quickly and easily using regular expressions. Author Kevin Skoglund covers the basic syntax of regular expressions, shows how to create flexible matching patterns, and demonstrates how the regular expression engine parses text to find matches. The course also covers referring back to previous matches with backreferences and creating complex matching patterns with lookaround assertions, and explores the most common applications of regular expressions.
In this movie, we're going to be learning about lookbehind assertions, and the metacharacters that we'll use for those are question mark, less than sign, equals for a positive lookbehind assertion, and question mark, less than, exclamation point for a negative lookbehind assertion. A lookbehind assertion is an assertion of what ought to be behind. It's very similar to what we had with lookahead assertions. If the lookbehind expression fails, then the match will fail. Any valid regular expression can be used inside there, and just like the lookahead assertions, they are going to be zero-width; they don't include the group in the match.
It's just an assertion about what ought to be true. The syntax is also very similar. You have, inside a group's expression, the very first characters will be question mark, less than, and the equal sign, followed by your regular expression. Or, question mark, less than, exclamation point, followed by your regular expression. Don't let the fact that we now have three symbols throw you. They are the same as the lookahead assertions, but with a less than sign tossed in the middle. You should think of it as an arrow that's pointing backwards. So the question mark indicates a change to the group's meaning, the arrow or less than sign tells us to look backwards, and then the equal sign says that it's a positive assertion, or the exclamation point for a negative assertion.
The way it would actually look in practice would be something like this. We're going to look behind us for the word base, and match the word ball when that's true. So it will match the word ball in baseball, but not in football. The way that the regex engine actually parses this is that as it's going through the word baseball, on every single letter, it stops, and looks backwards to see if it finds the word base. If it doesn't, then it keeps moving. And once that condition is satisfied -- that's the point at which we get right between the e and the b in baseball -- it looks backwards, and it says, okay, I see base now.
Now this is true, therefore, now I'll check the next part of the expression, and see if I have ball, and it does, and so we have a match on the word ball. Now, like the lookahead assertions, you can flip it around, and put the lookbehind assertion at the end. However, typically you don't do this. Most of the time you put it at the front, just because of the efficiency of it; of just moving backwards to that text that you've already matched the first time. Typically, it would either match the text the first time, or use a lookahead assertion to make sure that it match the second pattern instead of backtracking through it a second time with a lookbehind assertion.
Now, when I say simple expressions, what I mean is fixed length. If you think about it, everything in regular expressions so far has been about moving forward through strings, or rewinding. Well now we are talking about doing something different. We're talking about putting it in reverse, and going the opposite way looking for a string. That adds a whole new layer of complexity. It may seem like, oh, it's just simply going the other way, but it's kind of like driving a car. Everything about the car is built to go forwards. Now, you can go backwards, but it's a little bit harder to do, because the car is mostly designed to go forwards.
So it requires more effort on the part of the driver, and you typically have to go slower if you are going to go on reverse. It's the same thing with the lookbehind assertions. We can use literal text, we can use character classes, because they represent a fixed length; just a single character. But we typically can't use repetition, or optional expressions, because those are not fixed length. We saw that when we were first learning about repetition, that there's a lot of inefficiency, and a lot of moving and rewinding that goes on when we start having repetition. We can use alternation, but only with fixed length items, and that's for the exact same reason.
So, for example, we can backtrack, and find whether something is preceded by cat, dog, or rat, but you cannot check and see whether it has apple, banana, or plum, because each of those has a different length to it. Now, there are two notable exceptions here, which is that Java does allow you to use repetition and optional items, and .NET allows repetition, and optional expressions, as well as alternation with non-fixed length items. But as a general rule, your lookahead assertions can be very complex, but you should make your lookbehind assertions very simple.
Baseball and football each match. Now let's put our lookbehind, and let's say, only when it is base in front of it. So now it matches ball, and that's the only one that it matches. If I hit Command+G to Find again, it only finds one of them. Same thing, of course, if we put the not in front of it, now it finds football, and only football; not baseball anymore. Let's try another example. Let's say I have some names here. Let's have Benny, Benjamin, Jenny, and Lenny, and let's do a Find, and let's first just find J, A, M, I, N, or N, Y.
I am using an or operator there. So what I am essentially doing is trying to find the thing that comes after that first part of the name. So it finds all of those. Let me open up Find again, and this time, let's say our lookbehind, and let's say we want to look behind it only if you find Ben in front of it. So now it finds the N, Y in Benny, and the J, A, M, I, N that comes after Ben, but it does not find Jenny or Lenny. Now, we can use alternation, again, fixed length here.
Now it finds it for Benjamin, for Jenny, and for Benny, and of course we have the negative version of that, which would find it for Lenny, but not for the other two. So again, the concept is very similar to what we had for lookahead assertions. The only big difference is that it's not as widely supported, and we need to keep our expressions simple.
There are currently no FAQs about Using Regular Expressions.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.