Viewers: in countries Watching now:
Learn how to find and manipulate text quickly and easily using regular expressions. Author Kevin Skoglund covers the basic syntax of regular expressions, shows how to create flexible matching patterns, and demonstrates how the regular expression engine parses text to find matches. The course also covers referring back to previous matches with backreferences and creating complex matching patterns with lookaround assertions, and explores the most common applications of regular expressions.
We're going to begin learning the syntax that we need to write regular expressions by starting with the simplest match of all, a literal character. In other words, the letter A matches the letter A. To do this, we're going to be working with strings. Strings are just several characters that have been strung together. So let's say that we have the string car, which is in the double quotes there, and we have the regular expression car. /car/ matches car. /car/ also matches the first three letters of carnival. It doesn't match the entire expression; it just matches those first three letters.
It's very similar to searching inside a word processor. It's the simplest match there is. It just says look for literal car inside the string and tell me if you get a match or not. Now although it seems really simple, there are a couple of points that I want us to cover. The first is that these searches are case sensitive by default. That may not be the way your word processor works. So here, if we had capital CAR as a string, well lowercase car as an expression would not match it, because it's case sensitive by default.
Now my advice is to write all of your expressions as case sensitive because all engines support it all of the time, and you don't have to worry about putting it into case-insensitive mode. I think most people write them as case-sensitive expressions. When you get to character classes, we're actually going to learn how to write expressions which match both upper- and lowercase, which is basically the same thing as being case insensitive. Let's try it out. So in regexPal, let's just start by entering our test data here, car, and up here we'll put car. Notice that it highlights exactly that part of it.
If I had carnival, you see that it doesn't highlight the part that didn't match, so it's really indicating to me, with highlighting, exactly what part matches and what part doesn't. Notice that if I change it to Carnival capital, it doesn't match anymore. The case sensitivity does matter. I can click this box up here for case insensitive and now it does match again, because now the case doesn't matter anymore. Here case does matter. If we were to put a capital C here, well then it does match in that case and doesn't match in that case--again, unless that's turned on or not.
So that's how the case sensitivity works. Now I also want to mention that whitespace does matter. So for example if we have car matching car, this does not match. It's actually looking for those literal three characters in a row, and you kind of would expect that from your experience with a word processor. So having returns, things like that, don't match; it has to literally match those three characters, one after another. Now remember that there's also this Global mode up here. Let's talk a bit about. By default in regexPal, Global is turned on, but by default in regular expressions, it's not turned on.
That box is normally not checked. That's a modifier that regexPal is adding for us. Inside our tester, it won't make much of a difference, but let's understand what it is. In standard mode, or non-global mode, or non-global matching, the earliest, or leftmost match, is always preferred. So, as it's reading from left to right, when it comes to the first match, that's the one that it's going to prefer. So, for example, if we have the word pizzazz, zz would match the first set of Zs in pizzazz, not the second set. It's the leftmost set that's going to match.
In Global matching, it'll match all matches that are found throughout the text. So zz in Global mode would match both sets of Zs in pizzazz. Let's try that out. P-I-Z-Z-A-Z-Z. I've got the Global off. Let's type a zz up there. It found the first one. If I put a check there, and now it finds both of them. And it actually alternates the colors between yellow and blue to show me the difference between the different matches. So again, there's the single one and there's the second one. So we're starting to get some indication that the regex engine actually reads from left to right as it traverses through our string.
Let's take a closer look at that. Bear with me here if it seems a little obvious, because there are some points that I want to make. Let's say we have the string, "The cow, camel and cat communicated," and the regular expression we're going to use is /cat/. The first thing the regular expression engine does is it goes to the first character in the string and it says, does the t that's here match the first part of this regular expression? No, there's no match there. c and t are different, so it goes for the next character. Do c and h match? No it doesn't. And it keeps moving along until it gets to the c in cow. Then it says, Ah! I now have a match.
Maybe this is the beginning of a pattern that matches, let me see. So now, it goes to the next character and says o. Does o match the next part of the expression? No, it does not. So therefore, it now says, that's not a match, and it proceeds to start again, this time starting with the o. Does o match the c? No, it doesn't. Then it keeps going down the line. Now let's say we get to camel. Now it says ah, c, that's a match. Ah! a, that's a match too, and then it moves to the next letter and says oh, that's not a match.
All right, so now it's not a match anymore. It doesn't keep searching from the m; instead, what it does here is it backtracks to the a. You see why it does that? Just in case the pattern would've matched on the a, it goes back. So it didn't match starting with the c. "cam" was not a match. So now the next place it's going to try, it's going to try ame. So it's going to try those three letters. Those aren't a match, and it will keep going, and then it'll finally get to cat and it'll match the c, the a, and the t, and it will say a ll right, now I've got a match.
Now if we were in Global mode, then it would keep going and the next place it would try would be at the space. The space would be the very next thing that it would try, and then move along. The c matches but o doesn't, so then it starts trying again just like before, until finally it works its way along to cat, and then it match that for the second time. So I know that's overly simplistic, and you might have been able to guess a lot of that, but I wanted to point out especially that backtracking that it does, because that's going to be important later on. When we have complex expressions, it's going to do that backtracking like it did when it got to work camel.
So it's easy to see here, but it'll pay off when we start looking at more complex things later on. One important principle that also comes out of this that you need to understand is that regular expressions are eager. So keep that rule in mind: regular expressions are eager. They are eager to return a match to you. They want to return one as fast as possible. The concert for the earliest match being preferred is an important one. You'll here me mention eagerness several times throughout this training. It seems simple here, but when we start writing complex expressions, this eagerness is going to play a big role.
There are currently no FAQs about Using Regular Expressions.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.