Using Regular Expressions
Illustration by Mark Todd

Formatting with Search and Replace, pt. 3


From:

Using Regular Expressions

with Kevin Skoglund

Video: Formatting with Search and Replace, pt. 3

So, we have completed the first two tasks in formatting A Midsummer Night's Dream by using regular expressions. Now we are going to try third, and harder, task. The director of the play comes to us and says that really, ideally, she would like each of these actor's lines to be changed, so that instead of being the character's name, and then a new line, that it would have a colon, and a space. And that each of the lines after that would be indented, 1, 2, 3, 4; four spaces, just like that. Now, to go through all 3000 lines of the play by hand would take a long time to do that, but with regular expressions we don't have to.
Expand all | Collapse all
  1. 2m 18s
    1. Welcome
      56s
    2. Using the exercise files
      1m 22s
  2. 19m 55s
    1. What are regular expressions?
      3m 20s
    2. The history of regular expressions
      6m 40s
    3. Regular expression engines
      2m 44s
    4. Installing an engine
      4m 5s
    5. Notation conventions and modes
      3m 6s
  3. 21m 23s
    1. Literal characters
      6m 39s
    2. Metacharacters
      2m 1s
    3. The wildcard metacharacter
      4m 31s
    4. Escaping metacharacters
      4m 53s
    5. Other special characters
      3m 19s
  4. 31m 27s
    1. Defining a character set
      5m 49s
    2. Character ranges
      4m 49s
    3. Negative character sets
      4m 53s
    4. Metacharacters inside character sets
      5m 12s
    5. Shorthand character sets
      6m 31s
    6. POSIX bracket expressions
      4m 13s
  5. 36m 39s
    1. Repetition metacharacters
      7m 17s
    2. Quantified repetition
      6m 59s
    3. Greedy expressions
      6m 27s
    4. Lazy expressions
      6m 47s
    5. Using repetition efficiently
      9m 9s
  6. 20m 24s
    1. Grouping metacharacters
      4m 14s
    2. Alternation metacharacter
      4m 54s
    3. Writing logical and efficient alternations
      7m 33s
    4. Repeating and nesting alternations
      3m 43s
  7. 19m 19s
    1. Start and end anchors
      7m 21s
    2. Line breaks and Multiline mode
      4m 41s
    3. Word boundaries
      7m 17s
  8. 23m 33s
    1. Backreferences
      8m 57s
    2. Backreferences to optional expressions
      3m 51s
    3. Finding and replacing using backreferences
      7m 16s
    4. Non-capturing group expressions
      3m 29s
  9. 32m 32s
    1. Positive lookahead assertions
      6m 39s
    2. Double-testing with lookahead assertions
      7m 16s
    3. Negative lookahead assertions
      6m 11s
    4. Lookbehind assertions
      6m 26s
    5. The power of positions
      6m 0s
  10. 13m 13s
    1. About Unicode
      4m 19s
    2. Unicode in regular expressions
      4m 41s
    3. Unicode wildcards and properties
      4m 13s
  11. 1h 55m
    1. How to use this chapter
      5m 38s
    2. Matching names
      6m 33s
    3. Matching postal codes
      8m 54s
    4. Matching email addresses
      5m 0s
    5. Matching URLs
      8m 1s
    6. Matching decimal numbers and currency
      6m 45s
    7. Matching IP addresses
      7m 10s
    8. Matching dates
      7m 49s
    9. Matching times
      8m 59s
    10. Matching HTML tags
      8m 34s
    11. Matching passwords
      6m 49s
    12. Matching credit card numbers
      9m 36s
    13. Finding words near other words
      6m 38s
    14. Formatting with Search and Replace, pt. 1
      7m 22s
    15. Formatting with Search and Replace, pt. 2
      4m 15s
    16. Formatting with Search and Replace, pt. 3
      7m 10s
  12. 47s
    1. Goodbye
      47s

Start your free trial now, and begin learning software, business and creative skills—anytime, anywhere—with video instruction from recognized industry experts.

Start Your Free Trial Now
please wait ...
Watch the Online Video Course Using Regular Expressions
5h 36m Intermediate Nov 21, 2011

Viewers: in countries Watching now:

Learn how to find and manipulate text quickly and easily using regular expressions. Author Kevin Skoglund covers the basic syntax of regular expressions, shows how to create flexible matching patterns, and demonstrates how the regular expression engine parses text to find matches. The course also covers referring back to previous matches with backreferences and creating complex matching patterns with lookaround assertions, and explores the most common applications of regular expressions.

Topics include:
  • Creating flexible patterns using character sets
  • Achieving efficiency when using repetition
  • Understanding different types of search strategies
  • Writing logical and efficient alternations
  • Capturing groups and reusing them with backreferences
  • Developing complex patterns with lookaround assertions
  • Working with Unicode and multibyte characters
  • Matching email addresses, URLs, dates, HTML tags, and credit card numbers
  • Using search and replace to format a document
Subject:
Developer
Software:
Regular Expressions
Author:
Kevin Skoglund

Formatting with Search and Replace, pt. 3

So, we have completed the first two tasks in formatting A Midsummer Night's Dream by using regular expressions. Now we are going to try third, and harder, task. The director of the play comes to us and says that really, ideally, she would like each of these actor's lines to be changed, so that instead of being the character's name, and then a new line, that it would have a colon, and a space. And that each of the lines after that would be indented, 1, 2, 3, 4; four spaces, just like that. Now, to go through all 3000 lines of the play by hand would take a long time to do that, but with regular expressions we don't have to.

So we've got a number of different kinds of lines that we are going to have to deal with. So let's do these one at a time. First, let's take care of the matching the lines that are the character name, and put the Colon after it, and remove the line return that's there. We have seen something similar this before. We just have to say that we want to match all of the capitals, A to Z, and more than one of them. Of course, that will find it, but we want to find it not when it's inside the stage directions; we want to find it only when it's at the beginning of the line. So we can use our anchors for that; when the entire line consists of A to Z.

So there we are. We found -- let's jump back here -- Theseus, Hippolyta, Theseus, and so on. So let's try it, and it finds the first one, second one, and so on. So it is working. So that works, and gets just that line that we want. And we know how to replace it; we can just use our parentheses around it to capture it. We can use our backreference here to make sure that we find it, and then we are going to put a colon, and a space after it. And we also need to get rid of this line return, right? That line return has got to disappear. How do we get rid of it? Well, the way that we get rid of it is that we match for it, but then we don't put it in our replacement string down here.

We will leave it out, so it gets matched up here, but then it's not replaced when we replace it. Let me show you what that does. Let's jump up here at the top. We will find it, so here we are. Let's jump over here. Let's find the next one. We'll want to take out that trailing anchor now, and then we will say Return; here we are. It's finding the entire name, and the line return too. Now we hit Replace & Find. You can see that it did it right there. So now we have just a colon, and the space after it.

So that's the technique is, if we want to remove a line return, we find it, and make it part of the match, but not part of the capture, and not in the replacement string. So let's just try the next one. Find & Replace, Find & Replace, and they're all working correctly. We can finally just say Replace All, knowing that we have got them all. It will take a second, and you can see it replaced 492 of them. So it would take a long time for us to do that by hand, but regular expressions made it really, really fast. So now let's try and accomplish the second portion, which is that we are going to try and put four spaces in front of each of these lines that is not a character name.

We know how to find a character name, right? We have already written that regular expression that will find the character name. Now what we want to do is find something that is not that character name, and incidentally, we can change it now, since it's been changed to a colon, space in front of it. So how do we find something that is not that; that does not match that pattern? To do that, we need to use negative lookahead assertions. Remember, that's the powerful point of negative lookahead assertions, is that they allow us to specify that something is not a regular expression. So to do that, we put capture around the whole thing, and then we use question mark, exclamation point to say that it is not equal to this.

So that's what I am asserting is that it's not equal to that line. Let's jump here, and let me just use the Command+G. It's a little easier to see when I do it this way, instead of from the other window, because we can see where the cursor moves to. Do you see it skip across? It skipped pass the name of the character, down to the line below it, and it matches all of those lines. And then it skips across that name as well, and so on. Now that takes care of all of these blocks just fine, but what about the case in which I have stage directions? I don't want to put spaces in front of there, so I need to change my regular expression, and I am going to put parentheses around this portion, and put in an alternation.

So it's either -- if it starts with first name, it's not that, or also not if the character at the beginning is a square bracket. And of course, if we're putting a square bracket inside a character set, we need to use the backslash there, and the reason I am using a character set is because, if we jump up here, there's also another character we know we want to omit, which is the dash. Let's put that in as well. So if it's either a square bracket, or a dash, then we also want to ignore it. So there we go. I am skipping that one, it skips that one; it is matching, however, these new lines.

Those line returns in there. We don't want it to match those, so let's put in another alternation that says that it is not going to be a new line, and I noticed that it also matched up here for Act, Scene, and Location, which is going to occur a few times. So let's go ahead and say that it can't be those, and we can just put in the literal text. It can't start with Act, the line can't start with Scene, and the line can't start with Location. Now, it's possible that those words could be used inside the text, but I think it's unlikely. To make sure, I am going to go ahead and put a colon after them, and that will make sure that it really is in this context.

And you could even put the space that should be after it as well. Alright, so now let's try it. Let's jump back up here. We will use Command+G, and it jumps straight down here to the word Draws. See that? So it's now finding the right words. I am using Find to make sure that I have got the right stuff, and you can just go through these so you feel confident about it. Now, let's go back to our Find. Once we feel confident, now we need to put in our replacement string. We know the replacement string is going to be four spaces, but how do we insert it there? If you remember when we talked about lookahead assertions, lookahead assertions are zero-width.

So since we only have a lookahead assertion here -- that's all we have -- it is zero-width. Therefore, the cursor -- the position of the cursor -- will be right after the beginning of the line; right after the anchor tag. So we do our replacement here, space, space, space, space; four spaces. What it's going to do is it's going to take that zero-width match, and replace it with four spaces, essentially doing an insert. We saw this back in the chapter on lookahead assertions. So let's try it. Let's jump back to the top here, and I am going to come right below this line, because A Midsummer Night's Dream, and by William Shakespeare also would match, but I am going to start below them. And let's do Next, and then let's do replace. There it worked.

Replace & Find, Replace & Find; see how that works? And like I said, when you feel confident about it, and you feel like it's doing what you want, then you can just hit Replace All. Now, Replace All is going to wrap around, and of course, it is going to match these. I could have put them in as an exception, or since I know they are at the top of the document, I can also just take them out by hand. So now we can scroll down, and we can see that we have very much changed the formatting of the document. And we did it for every single line, all of the way through, very consistently, and we didn't have to enter them all by hand, all by using the power of regular expressions.

There are currently no FAQs about Using Regular Expressions.

 
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ .

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

* Estimated file size

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Using Regular Expressions.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member ?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferences from the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Learn more, save more. Upgrade today!

Get our Annual Premium Membership at our best savings yet.

Upgrade to our Annual Premium Membership today and get even more value from your lynda.com subscription:

“In a way, I feel like you are rooting for me. Like you are really invested in my experience, and want me to get as much out of these courses as possible this is the best place to start on your journey to learning new material.”— Nadine H.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.