Viewers: in countries Watching now:
Learn how to find and manipulate text quickly and easily using regular expressions. Author Kevin Skoglund covers the basic syntax of regular expressions, shows how to create flexible matching patterns, and demonstrates how the regular expression engine parses text to find matches. The course also covers referring back to previous matches with backreferences and creating complex matching patterns with lookaround assertions, and explores the most common applications of regular expressions.
In the last movie we learned about the grouping metacharacters, and those are going to be really useful as we turn our attention now to talk about alternation. The alternation metacharacter is just a single upright line. Take a second to locate that on your keyboard. On most keyboards it's going to be above the Return key. This is often referred to as the pipe character, because in UNIX its function is to pipe output from one command to another. In regular expressions, you'll usually hear me read it as "or," but if you hear me say pipe or pipe character, that's what it means--even though it serves a different purpose in regular expressions than it does in UNIX.
Now in some programming languages they use double pipes to represent "or." Be careful if you're familiar with those that you only use a single pipe here. So as I said, the pipe character is an or operator. It means either match the expression on the left or match the expression on the right, and it's ordered; it does them in that order: the leftmost first, then the one on the right. And you could have more than just two. You can daisy-chain them together, so we can have option one or option two or option three and so on. We can also use those grouping metacharacters that we just learned to group our alterations to keep them distinct from the rest of our expression.
I think that's a really good practice. Most of the time you're going to want to do that so it's really clear where the alternation stops and ends, because it's got those parentheses around it. It's also going to be important in some cases, because it changes the meaning if we don't have them. We'll see an example of that. Let's start by looking at some simple examples, if we just have apple or orange. So it matches the literal characters a-p-p-l-e or the literal characters o-r-a-n-g-e. Notice that those two strings do not have the same length. The parser is able to handle it, the fact that they're two different lengths.
It looks for either the word apple and if it can't find that, then it backtracks and looks for the word orange. If we have them daisy-chained together-- let's say we have abc|def|ghi|jkl--then it will any of those four combinations. As I was saying about putting the grouping operators around it to make it clear, apple followed immediately by either juice or sauce is not the same thing as apple juice or sauce--two separate things. One of the simplest use cases that you can use this alternation for is to find misspelled words.
So if you wanted to search a document for weird and you wanted to find it both spelled correctly and incorrectly, well, you could put inside your group expression an alternation ei or ie. Let's try these out. So to start with, I'm just going to paste in some text here, just so you can see. I've got apple and orange, appleorange all run together, an apple with an upright pipe orange, and that's my string. Let's also use apple|orange as a regular expression, and you can see what it matches. It matches apple or orange. And appleorange, it matched but only because it matched the two parts of that, apple and orange.
It didn't match the whole world together and even putting the pipe in the middle. I just want you to see that that operator is doing something that's not the literal character. If we put the escape in front of it, of course then it matches the literal version. Let's try another example. Let's just put the alphabet in here, and let's look for abc|def|ghi|jkl. See, it matches each one of those. Now we've got Global turned on. If I turn that off, you see it just matches the first one. It started at the beginning, and it said, all right, do I see an abc. Oh! I do. Great! I'm all done. If it hadn't found it, if let's say this had been acc, we'll, then it would try again.
It will keep moving until it got to the def. So it just moves along each one. It says, did this one succeed? No, it did not. Did this one succeed? Yes, it did. So now I have a match, and we never execute what's after that. We never even bother looking at it. We've got a match, and we can move on. Now as I said, grouping is always a good idea. Let's try our apple juice and apple sauce example and apple, try it this way first, juice or sauce. So it's either juice or sauce. Let's turn back on Global so that you can see it.
It found the word sauce because we said sauce was a valid choice, but it did not find applesauce. If we put the parentheses around it, now it finds what we intended, apple immediately followed by either our two expressions: either juice or sauce. Now, I've been using just literal text characters in each of these expressions here, just to make it simple so you can see, but these can be anything you want. These can be absolutely any expression. They can be expressions that include more alternations. They have character sets, repetition--all that stuff is allowable here.
The main principle here is just that it's expression one or it is expression two. Last of all, let's just try those misspelled words. Let's say that we have weird or wierd and then up here w(ei|ie)rd. It matches both possibilities. So that's the fundamentals of working with alternations. In the next movie, I want us to go a little bit deeper and talk about the way that the regular expression engine parses through these and also make sure that we know how to write good logical alternations.
There are currently no FAQs about Using Regular Expressions.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.