Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member

Greedy expressions

From: Using Regular Expressions

Video: Greedy expressions

In this movie, we won't be introducing another metacharacter. Instead we're going to be talking about an important principle of how regular expressions work. It's called greedy expressions. Often the regular expression engine has to make a choice about what it's going to return as a match. That becomes a especially true now that we are using repetition expressions, because now our strings are of an indeterminate length. Greediness is really just a term to describe how the regular expression engine makes that choice by default. Let's look at some examples so we understand what the problem is. Let say that we have an Excel file called 01_FY_07_report_99.xls, and we just have a real simple regular expression that says match some digits, followed by some word characters, followed by some digits. And don't forget, word characters can be letters, numbers, or underscores.

Greedy expressions

In this movie, we won't be introducing another metacharacter. Instead we're going to be talking about an important principle of how regular expressions work. It's called greedy expressions. Often the regular expression engine has to make a choice about what it's going to return as a match. That becomes a especially true now that we are using repetition expressions, because now our strings are of an indeterminate length. Greediness is really just a term to describe how the regular expression engine makes that choice by default. Let's look at some examples so we understand what the problem is. Let say that we have an Excel file called 01_FY_07_report_99.xls, and we just have a real simple regular expression that says match some digits, followed by some word characters, followed by some digits. And don't forget, word characters can be letters, numbers, or underscores.

So the question becomes, does the engine look at this and say, ah, I see a match-- it's 01_FY_07--or does it match the entire thing, all the way up to the .xls? Let's take a look at a second example. Let's imagine that we have a comma-delimited text file that has people's first names, their last name, and their company. So first name in quotes with a comma space, last name in quotes comma space, company name in quotes at the end. Well, what if we had a regular expression that looked for any character inside quotes, comma space, any set of characters inside quotes.

Would the regular expression engine returned to us the first name and the last name? That might be what we're expecting. Or would it return the last name and the company, or would it return all of it, in the case that maybe one of those wildcards with the plus after it actually could include the quote, the comma, the space, and the quote in between those two? I think you can see the problem and remember, these are not complex regular expressions, and we can already see the choices that it has to make. Imagine what happens when our expressions become complex. Well, the answer to the question about what it's going to match is that standard repetition quantifiers are greedy.

That means that the expression tries to match the longest possible string. And when I say the expression, I don't mean the entire expression; I mean the repetition-quantified expression, all right, so that one part of the expression tries to match as much as it can. Of course it's still is going to defer to achieving an overall match. So for example, if we had a filename.jpg and we were searching that for some wildcard characters.jpg, it wouldn't do us any good if the wildcard character was so greedy that it said all right, this entire file name that matches me, I match an F, I match an I, all the way down till it gets to the G, and then it say oops, but I didn't make a match overall because of the .jpg--that wouldn't do any good.

So the plus is greedy, but it gives back the JPEG at the end to make the match. You can think of it as rewinding or backtracking to make sure that it gets the match. So in this case, the wildcard that's get repeated would match filename and then it would move to the next part of expression to match the .jpg. Now, even though it does give back that portion, it's still greedy. It gives back as little as possible. So for example, let's say we had a string Page 266, and we had a wildcard that was repeated, followed by some digits that were repeated.

It doesn't say oh, it would actually be really nice of me to include all the digits in this digit expression, right; it doesn't make that distinction. It doesn't say somehow that that the digits are more superior to be grouped together in one group than these other wildcard elements are. It doesn't do that. It parses through it item by item. The wildcard character matches Page 266 and then it gives back only what it has to to make the match, which is that final 6. Let's actually look at the way it parses it, and I think that'll become clear. So let's say we have that exact expression and the string as Page 266.

The regular expression engine starts at the P and says ah! Does this match my wild card character? It does. Great! While, it's a repeated wildcard character, so let's see if the next one matches too. It goes to the a, and says yep, that matches my wildcard too. It goes to the g. That matches the wildcard and then the e, and the space and the 2 and the 6 and even the last 6. And it says ah! These all still match the wildcard. Boy, I'm doing great here. And then it gets to the end it's says, oops, I got to the end--I didn't get a match, so I probably shouldn't have been quite so greedy.

It knows that it had success with that first part of the expression, but it still didn't make an overall match. So it says, what if I was a little bit less greedy? What if I were to just go back one character and I gave that one to the next part of expression and see if that makes it match? So now the wildcard is matching just Page 26, the 6 then goes to the second part of the expression 0-9, and says yup, that works. Now I have a match and I'm completely done. Now if it hadn't made a match there, it's essentially the same thing as if it had been a wildcard with a 6 at the end.

Now if it hadn't found a match there then guess what it would have done next. It would have back tracked one more step and then it would have backtracked one more step, keep scaling back its greediness to see if being less greedy would allow the rest of the expression to still match. So to go back to our original examples and take a look at those, the answer is that it would match the entire thing, and that especially can throw you off. Especially in that second example, that catches a lot of people, because they think oh I'm just looking for the first and this last thing, but the thing is is that your wildcard is so broad that it is able to match so many things and that greediness kicks in and it just keep consuming parts of the string, so that the first part is being matched by the first name and last name, comma space, and then that second wildcard is matching the company name.

So we've already seen one important principle about regular expressions, and that is that regular expressions are eager. Now we have seen the second one, which is that regular expressions are greedy, and that make sense that the two of these go hand in hand. It's eager to give you a result, so what it does is it tries to just keep letting that first one do all the work. While we're already in the middle of it, let's keep going, get to the end of the string and then when it doesn't work out, then it will backtrack and try another one. It doesn't backtrack back to the beginning; it doesn't try all sorts of other combinations. It's still eager to get you a result, so it says, what if I just gave back one? Would that allow me to give a result back? If it does, great, it's done. It's able to just finish there.

It doesn't have to keep backtracking further in the string, looking for some kind of a better match or match that's further along. So that's what the concept of greediness is. So don't forget, by default, regular expressions are eager and they are greedy.

Show transcript

This video is part of

Image for Using Regular Expressions
Using Regular Expressions

59 video lessons · 11689 viewers

Kevin Skoglund
Author

 
Expand all | Collapse all
  1. 2m 18s
    1. Welcome
      56s
    2. Using the exercise files
      1m 22s
  2. 19m 55s
    1. What are regular expressions?
      3m 20s
    2. The history of regular expressions
      6m 40s
    3. Regular expression engines
      2m 44s
    4. Installing an engine
      4m 5s
    5. Notation conventions and modes
      3m 6s
  3. 21m 23s
    1. Literal characters
      6m 39s
    2. Metacharacters
      2m 1s
    3. The wildcard metacharacter
      4m 31s
    4. Escaping metacharacters
      4m 53s
    5. Other special characters
      3m 19s
  4. 31m 26s
    1. Defining a character set
      5m 49s
    2. Character ranges
      4m 49s
    3. Negative character sets
      4m 53s
    4. Metacharacters inside character sets
      5m 12s
    5. Shorthand character sets
      6m 30s
    6. POSIX bracket expressions
      4m 13s
  5. 36m 38s
    1. Repetition metacharacters
      7m 17s
    2. Quantified repetition
      6m 59s
    3. Greedy expressions
      6m 27s
    4. Lazy expressions
      6m 46s
    5. Using repetition efficiently
      9m 9s
  6. 20m 24s
    1. Grouping metacharacters
      4m 14s
    2. Alternation metacharacter
      4m 54s
    3. Writing logical and efficient alternations
      7m 33s
    4. Repeating and nesting alternations
      3m 43s
  7. 19m 19s
    1. Start and end anchors
      7m 21s
    2. Line breaks and Multiline mode
      4m 41s
    3. Word boundaries
      7m 17s
  8. 23m 33s
    1. Backreferences
      8m 57s
    2. Backreferences to optional expressions
      3m 51s
    3. Finding and replacing using backreferences
      7m 16s
    4. Non-capturing group expressions
      3m 29s
  9. 32m 31s
    1. Positive lookahead assertions
      6m 39s
    2. Double-testing with lookahead assertions
      7m 16s
    3. Negative lookahead assertions
      6m 10s
    4. Lookbehind assertions
      6m 26s
    5. The power of positions
      6m 0s
  10. 13m 13s
    1. About Unicode
      4m 19s
    2. Unicode in regular expressions
      4m 41s
    3. Unicode wildcards and properties
      4m 13s
  11. 1h 55m
    1. How to use this chapter
      5m 38s
    2. Matching names
      6m 33s
    3. Matching postal codes
      8m 54s
    4. Matching email addresses
      5m 0s
    5. Matching URLs
      8m 1s
    6. Matching decimal numbers and currency
      6m 45s
    7. Matching IP addresses
      7m 10s
    8. Matching dates
      7m 49s
    9. Matching times
      8m 59s
    10. Matching HTML tags
      8m 34s
    11. Matching passwords
      6m 49s
    12. Matching credit card numbers
      9m 36s
    13. Finding words near other words
      6m 38s
    14. Formatting with Search and Replace, pt. 1
      7m 22s
    15. Formatting with Search and Replace, pt. 2
      4m 15s
    16. Formatting with Search and Replace, pt. 3
      7m 10s
  12. 47s
    1. Goodbye
      47s

Start learning today

Get unlimited access to all courses for just $25/month.

Become a member
Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ.

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Using Regular Expressions.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

Get started

Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferencesfrom the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Are you sure you want to delete this note?

No

Your file was successfully uploaded.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.