Viewers: in countries Watching now:
Learn how to find and manipulate text quickly and easily using regular expressions. Author Kevin Skoglund covers the basic syntax of regular expressions, shows how to create flexible matching patterns, and demonstrates how the regular expression engine parses text to find matches. The course also covers referring back to previous matches with backreferences and creating complex matching patterns with lookaround assertions, and explores the most common applications of regular expressions.
In this movie, I want to demonstrate how we can use regular expressions, with Find & Replace, in order to format or reformat text documents. This is the kind of situation that I find regular expressions to be extremely useful and very powerful, and I use it all the time. I want to demonstrate the exact steps and thought process that I typically go through. For this demonstration, I'm going to use Shakespeare's A Midsummer Night's Dream, and a copy of that's inside your exercise files. Now, when I first got this text, before including in the exercise files, it was in pretty bad shape. It needed lots of formatting help, and I used the exact same regular expression techniques that I'm about to show you to try and make it into a well formatted document.
Now, it's always going to be easier to write regular expressions for a well formatted document than one that's not. You can do it for both, but obviously if it's well formatted, there won't be as many exceptions to the rule that you have to account for. This document now is pretty well formatted. You can see that it has the title and the author at the top, then we have Act 1, Scene 1, the location of the scene, space, we have some stage directions here in parentheses, we have the character's name that's about to speak, followed by their speaking lines, and then a Return, the next character, their speaking lines, and so on.
You can scroll down, and see that it follows that same pattern all the way through. This consistency is what I mean by well formatted. So here's the situation. Let's imagine that there is a community theater that we're working with, and they're going to be putting on A Midsummer Night's Dream. The director comes to us, and says that she'd really like some changes to the formatting of the script before they print it out for all of the actors, but unfortunately, it would take too long to do it by hand. Sure enough, if we scroll down, you can see that this has over 3400 lines. It would take a long time to make those changes throughout the whole document.
But using regular expressions, we don't have to. So we tell the director, we can make your changes for us; just tell us what changes you want. She says, the first thing is I'd really love it if the stage directions that were here, if instead of having these parentheses around them, if they can have square brackets around them instead, and if they can have an extra line return before each of them. We say, great, that's going to be our first task. We're going to solve that problem for you using regular expressions. So let's change it back to the way it was. Always a good idea to save an original copy too, just so that if we do horribly mess it up, we can go back to the original, and let's try writing a regular expression that will match it.
Now, in TextMate, the way that I can get to that regular expression is from the Edit menu, there is Find, and then pull out to Find. Or Command+F also is the quick key to get me to this Find area. Now, whenever you're making changes to a document, it's always a good idea to click Wrap around. That way you don't accidentally make changes just to the last half of the document, and it never wraps around to find those changes back at the beginning. Now, we could just simply do a Find for all of the opening parentheses, and change those into square brackets. That would work, and it doesn't even require regular expressions, but we want to use our power of regular expressions to actually go a little further.
What if there were parentheses that were not at the beginning of the line for stage directions? Well, without regular expressions, we couldn't find those, but with regular expressions we can say we're looking for the ones that are anchored. So we click regular expressions, we check for the ones that are anchored, and that should find them when they're at the beginning of the line. We can click Next to have it actually find them. Let's jump here to the beginning. Also, I need to escape that, so that it doesn't think that it's opening a group. So I've got my escape in front of it, and there it is. So now it finds the first one, and the next one, and the next one. I find that it's always a good idea to first step through and find all the elements you're looking for.
Make sure that you're matching the correct things before you do replacing. So now let's actually do some replacing. Let's jump back up to the top again, and let's do our Find, and I'll start with the Next. So we found one; let's click Replace & Find, so that will replace just one of them, and then go ahead and find the next one. So it did it. Now let's just look up here at the top, and sure enough, it looks correct. It looks like it did it correctly. Now, before we go any further, let's do Undo, and change that back. The director also asked for one other thing, which is for the new line to be added in there as well.
Now, we might as well go ahead and accomplish both of these tasks at the same time. You'll notice that these stage directions here have several line returns before them. There is a line return here, there is a line return here, and then there's the parentheses. So instead of the anchor, we can actually use the line returns. Two line returns before it; if that's the case, we want to have three line returns before it now. Again, let's jump back up here to the top of the document. Let's click Next. I found it, including those line returns. Now let's do Replace & Find.
Let's just check it, and sure enough, it added in that extra line return that we wanted. So now we have the next one. It's already found it right here; click Replace & Find. Yup, and all of them are doing exactly what we wanted. I find it's a good idea to step through with Replace & Find a few times, and make sure that it's doing what you want. The same way we just used Find to begin with to make sure it was matching what we want, now we can make sure it's replacing what we want. Then we have a choice. We can either keep hitting that over, and over, and over again, or if we are confident that it's going to work, we can click Replace All. So we click Replace All, and it finds a 119 more occurrences.
So at this point, you might want to skim the document; make sure that it didn't have any problems. Another thing I like to do is I like to remove one of these, and just test my assumption that all of these parentheses had two line returns in front of it. Are there any of that that only had one? Let's check it real quick. Hit Return, and sure enough, it does. It found a stage direction buried at the beginning of a line, and if we Tab through, we'll see that there is actually a few of those. So we asked the director, well, what do you want to do here? The director says, well, I don't want to put a new line return in front of it.
I just want to change it into square brackets. We'll just take this down to one Return. So when you find it, just change it into square bracket. We can do Replace & Find, and Replace & Find, and finally just say Replace All for the rest of those. So that took care of the first bracket. We've also got the trailing parentheses at the end. So we want to make sure that we get this one here. We can do the same thing in reverse. We can use the parentheses, and then the anchor at the end to change it to our square braces.
You also are going to want to escape that as well. Let's do Next, and there it is. So I found it, Next; I found that one, Next. It keeps hiding them underneath here. There they are. So you can see that they're all right there. So it is finding them. I'm going to go ahead and say Replace & Find, and let's just double-check those that it found. There they are! It looks good. So let's go ahead and hit Replace All, and it'll replace a 120 of them. So that it takes care of one pair or parentheses, but don't forget we had another one. That was when the stage directions were inside the lines.
Remember, there were few of those that we found. So let's just take out this line ending, and let's do a search for it again, and there you can see what I mean. So right here, we have a square bracket before Awaking, but we have a parentheses at the end. So to replace all of those, we can just do Next and find a few of them. There is not that many, and then we can say Replace & Find, and actually take a look at them to make sure that they're working. There you see that it did right for Reads, and then finally hit Replace All, and it did 10 more. I think they were 13 altogether. So now we've successfully accomplished our first task.
It was a pretty simple task just to get us started with how to use Find & Replace, but I think you can already see how it saved us a lot of time, and kept us from having to make these changes manually, one by one. In the next movie, we'll move on to a second harder task.
There are currently no FAQs about Using Regular Expressions.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.