Start your free trial now, and begin learning software, business and creative skills—anytime, anywhere—with video instruction from recognized industry experts.

Start Your Free Trial Now

Writing logical and efficient alternations


From:

Using Regular Expressions

with Kevin Skoglund

Video: Writing logical and efficient alternations

Now that we have the basics of working with alternations, I want us to dive a little bit deeper and make sure that we write logical and efficient alternations. First, keep in mind the basic principals of regular expressions that we've seen so far, that they're eager and they're greedy. Those have an impact on the way that it process alternations. Let's go into regexpal, and let's start by just putting in peanut butter, and then up here let's put in either peanut|peanutbutter. Notice that it matched peanut, not peanut butter.
Expand all | Collapse all
  1. 2m 18s
    1. Welcome
      56s
    2. Using the exercise files
      1m 22s
  2. 19m 55s
    1. What are regular expressions?
      3m 20s
    2. The history of regular expressions
      6m 40s
    3. Regular expression engines
      2m 44s
    4. Installing an engine
      4m 5s
    5. Notation conventions and modes
      3m 6s
  3. 21m 23s
    1. Literal characters
      6m 39s
    2. Metacharacters
      2m 1s
    3. The wildcard metacharacter
      4m 31s
    4. Escaping metacharacters
      4m 53s
    5. Other special characters
      3m 19s
  4. 31m 27s
    1. Defining a character set
      5m 49s
    2. Character ranges
      4m 49s
    3. Negative character sets
      4m 53s
    4. Metacharacters inside character sets
      5m 12s
    5. Shorthand character sets
      6m 31s
    6. POSIX bracket expressions
      4m 13s
  5. 36m 39s
    1. Repetition metacharacters
      7m 17s
    2. Quantified repetition
      6m 59s
    3. Greedy expressions
      6m 27s
    4. Lazy expressions
      6m 47s
    5. Using repetition efficiently
      9m 9s
  6. 20m 24s
    1. Grouping metacharacters
      4m 14s
    2. Alternation metacharacter
      4m 54s
    3. Writing logical and efficient alternations
      7m 33s
    4. Repeating and nesting alternations
      3m 43s
  7. 19m 19s
    1. Start and end anchors
      7m 21s
    2. Line breaks and Multiline mode
      4m 41s
    3. Word boundaries
      7m 17s
  8. 23m 33s
    1. Backreferences
      8m 57s
    2. Backreferences to optional expressions
      3m 51s
    3. Finding and replacing using backreferences
      7m 16s
    4. Non-capturing group expressions
      3m 29s
  9. 32m 32s
    1. Positive lookahead assertions
      6m 39s
    2. Double-testing with lookahead assertions
      7m 16s
    3. Negative lookahead assertions
      6m 11s
    4. Lookbehind assertions
      6m 26s
    5. The power of positions
      6m 0s
  10. 13m 13s
    1. About Unicode
      4m 19s
    2. Unicode in regular expressions
      4m 41s
    3. Unicode wildcards and properties
      4m 13s
  11. 1h 55m
    1. How to use this chapter
      5m 38s
    2. Matching names
      6m 33s
    3. Matching postal codes
      8m 54s
    4. Matching email addresses
      5m 0s
    5. Matching URLs
      8m 1s
    6. Matching decimal numbers and currency
      6m 45s
    7. Matching IP addresses
      7m 10s
    8. Matching dates
      7m 49s
    9. Matching times
      8m 59s
    10. Matching HTML tags
      8m 34s
    11. Matching passwords
      6m 49s
    12. Matching credit card numbers
      9m 36s
    13. Finding words near other words
      6m 38s
    14. Formatting with Search and Replace, pt. 1
      7m 22s
    15. Formatting with Search and Replace, pt. 2
      4m 15s
    16. Formatting with Search and Replace, pt. 3
      7m 10s
  12. 47s
    1. Goodbye
      47s

please wait ...
Watch the Online Video Course Using Regular Expressions
5h 36m Intermediate Nov 21, 2011

Viewers: in countries Watching now:

Learn how to find and manipulate text quickly and easily using regular expressions. Author Kevin Skoglund covers the basic syntax of regular expressions, shows how to create flexible matching patterns, and demonstrates how the regular expression engine parses text to find matches. The course also covers referring back to previous matches with backreferences and creating complex matching patterns with lookaround assertions, and explores the most common applications of regular expressions.

Topics include:
  • Creating flexible patterns using character sets
  • Achieving efficiency when using repetition
  • Understanding different types of search strategies
  • Writing logical and efficient alternations
  • Capturing groups and reusing them with backreferences
  • Developing complex patterns with lookaround assertions
  • Working with Unicode and multibyte characters
  • Matching email addresses, URLs, dates, HTML tags, and credit card numbers
  • Using search and replace to format a document
Subject:
Developer
Software:
Regular Expressions
Author:
Kevin Skoglund

Writing logical and efficient alternations

Now that we have the basics of working with alternations, I want us to dive a little bit deeper and make sure that we write logical and efficient alternations. First, keep in mind the basic principals of regular expressions that we've seen so far, that they're eager and they're greedy. Those have an impact on the way that it process alternations. Let's go into regexpal, and let's start by just putting in peanut butter, and then up here let's put in either peanut|peanutbutter. Notice that it matched peanut, not peanut butter.

That's because it's eager. It's eager to return a result, and as we saw before, the leftmost item gets priority. Therefore, it's going to prefer to match the first item and never even attempt the second expression at all. Now if you wanted it to return peanut butter, we saw how to do that before by using our option. Let's put in this. There it is. Now it uses peanut and then butter is optional, but butter is preferred because, remember, it's greedy. So by default it's going to prefer that it has it over not having it.

See the difference? Let's try a slightly more complicated example. Let's say that here for a string we have a file name that's 2003_report.xls, and to match that, I'm just going to paste in a regular expression here. What I've got is an alternation. The first alternation is right here. It's just a word character one or more times, and then I've got a second choice which is that it's FY four digits_report.xls. Actually, let's escape this. There it is. That's correct. So now the second one is clearly a better match.

We look at the two and you think, oh well, the second one matches it really well. But which one is it using to match it? It's using the first one. That's why it didn't highlight all of the .xls that's using this expression right here. It was eager to return us a match, so we never tried the second one. It has no concept that that's a more appropriate match or anything like that. It just uses the left one. It says, I've got a match, here is your result, and just returns it back to you. Now let's take a little bit closer look at the way that the parser actually does parse things, because I want you to make sure that you understand this. Let's go back to our example where we had abc|def|ghi|jkl.

So it matched all of those. Turn off Global so you can see that it's matching just the abc. Eager to return a result to you. Now I try putting xyz at the beginning. That's the first option, xyz. Now what did it return to you? It did not jump down here and find xyz. It returned the first option. It returned the abc. That's because it doesn't scan the whole string looking for the first option and then come back and then scan the whole string looking for the second option. Instead, it starts at the beginning of the string and starts going through all those options.

Let's watch how it happens, and I think that will make more sense. So in my example here I have "I think those are thin trees," and I have four possible options in my alteration: three|see|thee|tree. So what happens is the parser starts at the beginning, and of course I doesn't match any of those, so we really don't need to worry about that. It finally though gets to the T and then it says, all right, do I have a match for this first option here? It says, yes, the first one does match. I see a T. So option one is a possibility. Let's check the next character. It's an H. Ah! Option one is still a possibility.

It goes to the I and it says option one is no longer a possibility, because it's not an R. So at that point it jumps back to the T again, and says all right, let's check option two now. Option two, well, that's an S. No, it can't be option two. Let's try option three. Option three has a T in it, so it says that's a possibility. Let's check the next character. It's an H, still a possibility. It gets to the I. No longer a match. Option three has now been ruled out. It jumps back to the T again. Now it says, let's try option four.

Option four, the T works. Try the second character. Nope, option four has been ruled out. All four options have been ruled out; therefore the T is not the beginning of a match. So we moved to the next character an H. Does H match the first one? No. The second one, the third one, the fourth one? None of those match. So it keeps working its way along until it gets to the T and the H of those. So again, option one matches both of those. It gets to the O. It doesn't match, so it rewinds back to the T. Option two doesn't match.

Option three is a possibility. Let's check its next character. Still a possibility. It goes to the third character. No longer possibility. It jumps back to the T. So you can see it does the same thing. It moves along and you get the same thing on thin as well. So when we finally get to the word trees, then it says all right, the first one, that's a possibility. It's a T. So it goes to the second character. That's not a match. Rewind. Now it goes to the second option. That's not a possibility. We rule it out. We moved to third one. That has the T, great! Let's check the next character.

It gets H, it's no longer possibility. So it rewinds back to the T and then it says, all right, the fourth option t-r-e-e, and then it knows it's got its match. So you see the little dance that it does with moving forwards and backwards as it tries each of these options. It does not take three and somehow scan the entire string looking for three and then back up and scan the whole string looking for C and then back up and scan the whole string looking for thee and then finally tree.

It doesn't do it that way; it does it by position as it moves through. So it moves through and does its little backtracking dance each time as it tries each of those four options, and that's what I want you to understand. You'll see why it does not highlight the xyz here at the end. It finds the abc instead, because it starts at the beginning and it says, no, first option is not a possibility. Then it moves to the second option and it say, here is a possibility. It moves along to the A and the B and the C and so it says, oh, got a match. I'm all done.

It can very eagerly return that match to you. One last point I want to make about writing efficient alternations is perhaps common sense, but it is just to put the simplest or the most efficient expression first. Let's imagine that we have three alternations like this. So the first one says, look for any word character one or more, underscore, and then either two to four digits, or look for four digits, underscore, two digits, underscore, and some unknown quantity of word characters. Or, that's the third choice, look for the literal characters, export, and then two digits after it.

So that's what we're looking for. It's much more efficient if we flip it around and write it in the opposite way, because this way the regular expression engine can check a character and say, is this character an E? It is not an E; therefore option one has been ruled out right away. We're ready to move on to option two. Option two says, is it a digit? If it is a digit then we can go ahead and check it, but if it's not, we've ruled out two of our three options now, very quickly. And now we can check the third one, which is going to take a lot longer, because it's much more permissive. That first character can be a lot of things.

It can be any letter, any number, or an underscore, and the character after that can be any of those things, and so on. So it can check a lot of things and do a lot of backtracking as it tries to figure it out in that third option. But the first two have been quickly ruled out. And if we have a line where one of those first two items matches, then we save ourselves the trouble of doing that third step at all. We find the match and we move on. So hopefully now you have a good understanding of the way that alternations work, the way that the regular expression engine moves through them, and how you can write good, logical, and efficient alternations.

There are currently no FAQs about Using Regular Expressions.

 
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ .

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

* Estimated file size

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Using Regular Expressions.

Return to your organization's learning portal to continue training, or close this page.


OK

Upgrade to View Courses Offline

login

With our new Desktop App, Annual Premium Members can download courses for Internet-free viewing.

Upgrade Now

After upgrading, download Desktop App Here.

Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member ?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Learn more, save more. Upgrade today!

Get our Annual Premium Membership at our best savings yet.

Upgrade to our Annual Premium Membership today and get even more value from your lynda.com subscription:

“In a way, I feel like you are rooting for me. Like you are really invested in my experience, and want me to get as much out of these courses as possible this is the best place to start on your journey to learning new material.”— Nadine H.

Start your FREE 10-day trial

Begin learning software, business, and creative skills—anytime,
anywhere—with video instruction from recognized industry experts.
lynda.com provides
Unlimited access to over 4,000 courses—more than 100,000 video tutorials
Expert-led instruction
On-the-go learning. Watch from your computer, tablet, or mobile device. Switch back and forth as you choose.
Start Your FREE Trial Now
 

A trusted source for knowledge.

 

We provide training to more than 4 million people, and our members tell us that lynda.com helps them stay ahead of software updates, pick up brand-new skills, switch careers, land promotions, and explore new hobbies. What can we help you do?

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.