Viewers: in countries Watching now:
Learn how to find and manipulate text quickly and easily using regular expressions. Author Kevin Skoglund covers the basic syntax of regular expressions, shows how to create flexible matching patterns, and demonstrates how the regular expression engine parses text to find matches. The course also covers referring back to previous matches with backreferences and creating complex matching patterns with lookaround assertions, and explores the most common applications of regular expressions.
We've covered the fundamentals of lookaround assertions, and we've seen that essentially they allow testing of a regular expression apart from matching. This simple fact gives us a lot of functionality; some very cool features that we can use. We've seen that we can peek forwards and backwards to see whether something is true. We can match a string by using multiple expressions. We saw how we can define rejection expressions; things that should be excluded from our match, and we even saw that we could find the last occurrence of something. Now I want us to talk about a subtle but powerful aspect that we haven't really covered in depth yet, and that is the power of positions.
If you'll remember, all of these assertions are zero-width. Hopefully I've drilled that fact into your head by now, and zero-width means zero position movement. Here is what I mean by that. Let's say we have our original example, where we have a positive lookahead assertion for seashore, followed by sea. That would match sea, and seashore. Now remember, what happens is the regular expression engine first matches our assertion, and when it's done, it rewinds back to the beginning. So at that point, it has no position movement; the position has not changed at that moment when the expression ends.
Let me give you another example. Here, we are going to use a lookbehind assertion. What I am going to do is I am going to have 54.00, and dollar sign 54.00. Those are my two bits of text. What I have is a lookbehind assertion that looks behind, and makes sure that I don't have a dollar sign or a digit, and then I do have some digits, followed by a decimal, and two more digits. So that will match 54.00, but not when it has that dollar sign in front of it. That negative lookbehind makes sure that we don't match the one that has the dollar sign there.
We included the digit with that dollar sign to make sure that it also doesn't match 4.00. It goes ahead and takes any digits that it can. All right! Now imagine for a moment, what if we took that second expression -- the backslash, D, plus, period, backslash, D, backslash, D. What if we took that part, and put that inside a positive lookahead assertion, like this? At the end of that, what gets matched? We have two assertions; a lookbehind assertion, and a lookforward assertion. Nothing gets matched, because neither one of those include that assertion in the match.
But the regex engine does find a match; it does succeed and say, ah! Both of these assertions are true, but I have a zero-width match. So it matches, but the final match is zero-width. More importantly, though, where is the regular expression engine pointer at the end of this, after it makes the match? It rewinds once it's done its backwards looking; it rewinds once it's done its forwards looking. So at that point, the regex engine's pointer is sitting right in front of the 5 for 54.00. Do you see that? Now, why does it matter that we've matched zero-width, and the pointer is now sitting at that place, and the regex engine thinks that it's accomplished its job? Well, as I said, it's a subtle point, but it's very powerful, because this is very useful for inserting text by using Find and Replace.
We've located a position, the character that we are going to replace is zero-width, but we are going to place it with something that does have some width. So essentially, that's the same thing as inserting. Going to a position, not selecting any width, but putting something in that place. Pretty cool, huh? Let's try it out. And because we are going to be using Find and Replace, I am going to be showing this to you in TextMate. Let's say that we've got a simple sentence; This costs 53.00, or 54.00 with a dollar sign in front of it. I am going to open up my Find.
I'll do that with Command+F, and for my regular expression here, let's put in a negative lookbehind. So not anything in the character set, dollar sign, or digit. And then once that lookahead is done, let's just look for some digits, backslash, decimal, backslash, D, backslash, D. So it matches 53.00, but it does not match 54.00. So that one is just a full regular expression. we've seen that before. Now what we want to do is turn that into a positive lookahead assertion.
So now, we're just asserting that this ought to be true. Notice where my cursor went? Let me move it, just so you see it again. It jumps right in front of the 53. So now, if we say Replace All, boom! Look at that. It just dropped in our dollar sign right in front of the five. Let's try another example. I am going to put in a sentence here. An astronomical unit is, this very long number of kilometers; approximately the average distance between the Sun and Earth. What we want to do is add commas to delimit this number.
So after every three digits, there should be a comma. So how can we accomplish that? Let's go here to our regular expression. We know we are going to want to replace it with a comma. Our Find, though; we are going to want to find three digits in a row. We are going to group those together, and repeat those, and that should then find sets of three digits. Ah! But, look at that; it found the whole entire thing. What we want to do is tell it to look behind, and make sure that there is a digit in front of it. No reason to delimit the first three. We only want to delimit it if there's a digit in front of it. At the end, we're going to have a negative lookahead assertion for something being a digit at the end.
So now we are saying, all right; find the sets of digits that have a number in front of them, and don't have a number after them, and in sets of three. Let's take that whole thing, and now let's turn this expression right here -- this middle part -- into an assertion as well. So now it will have zero-width, and actually we want to include this in the assertion as well; that it has no digit after it, and now let's do a Find. There it is! So it found the spot between the first three digits, and the spot between the second three. If I just keep going back and forth, you see that's the two spots that it found, and so now if I do my insertion in each of those spots -- let's do Find, and let's do Replace All -- now we get our comma delimited number.
So as I said, this is a very powerful behavior that arises from a very subtle aspect of the way that the zero-width assertions work.
There are currently no FAQs about Using Regular Expressions.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.