Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member

Double-testing with lookahead assertions

From: Using Regular Expressions

Video: Double-testing with lookahead assertions

In the previous movie, we learned about the basics of positive lookahead assertions. In this movie, I'd like us to revisit some of the examples that we just saw. So, we saw that we could have a lookahead assertion for seashore, followed immediately by another expression: S, E, A, and that would match sea in seashore, but not in seaside. Then we saw that that's the same thing as if we had S, E, A as an expression, followed by a lookahead assertion for shore. So why use one over the other? Both of these would match the exact same text.

Double-testing with lookahead assertions

In the previous movie, we learned about the basics of positive lookahead assertions. In this movie, I'd like us to revisit some of the examples that we just saw. So, we saw that we could have a lookahead assertion for seashore, followed immediately by another expression: S, E, A, and that would match sea in seashore, but not in seaside. Then we saw that that's the same thing as if we had S, E, A as an expression, followed by a lookahead assertion for shore. So why use one over the other? Both of these would match the exact same text.

However, there are two important differences that I want us to notice. The first is the order in which the expressions are executed. In the first example, it attempts our assertion before it attempts to match S, E, A. In the second example, it tries to match S, E, A before it matches our assertion. That order can make a big difference if you are trying to optimize your regular expressions for speed and efficiency. So you will want to keep that in mind, and just be mindful of which one is executing first. More importantly, the second difference is about the position where it starts looking for each expression.

In the first example, it will start at the beginning of the string seashore, and test for our lookahead assertion. If that assertion is true, then it rewinds back to the starting position, and begins testing the main expression for a match. That's why there is zero-width, because it does that rewinding. In the process, the regex engine travels across the same territory, the S, E, A in our string, two different times. By contrast, in the second example, it matches the first expression: S, E, A, and then without doing any rewinding, it checks for our lookahead expression.

Then once that's done, it rewinds back to the position right after the A that it matched. The territory, though, of S, E, A is only traveled over one time. So why does that matter? Well, because since we are going over that same territory more than once, it allows us to match a pattern that also matches another pattern. Let me give you an example. Let's say that we have a regular expression to just match a simple 10 digit phone number. This is the way phone numbers are formatted in the United States: three digits, dash, three digits, dash, four digits.

We know how to do that. Let's imagine, now, that we want to write a different regular expression that says from the beginning to the end of the phone number, we should only have the digits zero through five, and hyphen. No digits larger than five are allowed. What we can do with lookahead assertions is we can actually put both of those tests together, and test both of them to be true. So now I have a combination expression that says, alright, look ahead and see whether or not you have only the digits zero to five, and hyphens, and then if that's true, now match the format, and make sure that the format matches.

Now, of course, you could write the first expression by defining our backslash D as being just a character set, zero to five, but that's not the point. The point is that the lookahead assertions allow us to run two different regular expression tests on the same string before it returns a successful match, and you aren't limited to just two. Because it rewinds each time, we can continue stacking these assertions. So, for example, let's say that we check that it's zero to five, but then we also check that the string somewhere has the digits four, three, two, and one in it. So now we have three regular expressions that are all being run on the same string.

The first two are assertions, and they both have to match, or it won't try the third matching expression. This is powerful stuff that let's us write expressions that we wouldn't be able to write otherwise. Let's try it for ourselves. So let's just put in some test data here. I am going to put in three different phone numbers. I have broken them onto separate lines. So I am going to use multi-line anchors, because I know I am going to be using some anchors as well, and then let's start writing an expression. So I can have backslash, D, times three, dash, backslash, D, three times, dash, backslash, D, four times. So now we've matched all of those phone numbers.

What we are going to do now is put a lookahead assertion at the beginning that's going to say that from the beginning to the end that we should have only characters zero through five, and also the dash repeated. So there it is. You see? We only found the two phone numbers which have digits that are less than zero to five. That middle one got excluded. Now let's put in our next one, and this one -- let's put an equals -- and in this expression, we are going to say that there is a wildcard that can occur zero or more times, and then four, three, two, one.

We want to find the digits four, three, two, one in there somewhere. So now notice what it's doing: it's three times it's going over that same territory. So when it gets that first phone number, it first goes over it and makes sure that it has digits zero through five and hyphens, then it goes through and makes sure it has four, three, two, one, and then it checks to make sure that it's properly formatted. On the second phone number, it tries the first assertion, and it fails. As soon as it gets to that seven, it says oops, nope, failed the assertion, move on; don't try the other two. Then it tries the third phone number.

It makes sure that the first assertion passes. So it does have digits less than five. Then it goes and looks and when it realizes that there is no four, three, two, one in there, then it stops, and it never attempts the third assertion. Let's revisit our words that are followed by commas example. I am going to open up the Self- reliance text that we had before. I'll just copy that, and we'll just paste that in down here. Now I am just going to paste back in the expression that we had before that finds all words, and we are looking ahead to see if there's a comma after it. Now, in addition to that, right after the word boundary -- so we first make sure we are at the start of a word -- then let's put another lookahead assertion.

So right here, we are going to do a lookahead assertion, and let's find all words that contain a G, H in them. So words that contain a G, H would be some word character -- we don't know how many, there might be none -- followed by the letters G and H. Do you see how that works? We are now using an assertion to make sure that it has a G and H in it, and if the word has a G and H in it, then we make sure that it has the letters A to Z and apostrophe. Then if that's true, then the last thing we do is we check to see, if we kept going, would the next character be a comma.

This expression would be difficult to write without using lookahead assertions. And even more than that, it makes it clear what our intention is. It's much easier to read and understand what we're going for, and what our requirements are. Let's take a look at another simple example. Let's say that we want to have a password, and we want to make sure that the password, from start to end, matches only any characters, and we'll make them eight to fifteen characters long for our password. And in addition, we want to check and make sure that the password has a digit in it. Well, we know how to do that now.

It can have zero more wildcard characters, followed by a digit. So if we have swordfish, it doesn't match. If I put in sword42fish, now it does match. It requires that not only the password be eight to fifteen characters long, but it also makes sure that there's one digit in it. If we want to make sure that it has uppercase letter as well, well we can just add another one. We can have character sets A to Z, and then let's put in some indeterminate characters before it. There we go.

Now there is no capital letter, so it fails. As soon as we put in a capital letter, now it matches. The ability to use lookahead assertions to double-check with multiple expressions is a powerful tool. So far we've only been working with positive lookahead assertions, though. In the next movie, let's look at negative lookahead assertions.

Show transcript

This video is part of

Image for Using Regular Expressions
Using Regular Expressions

59 video lessons · 11689 viewers

Kevin Skoglund
Author

 
Expand all | Collapse all
  1. 2m 18s
    1. Welcome
      56s
    2. Using the exercise files
      1m 22s
  2. 19m 55s
    1. What are regular expressions?
      3m 20s
    2. The history of regular expressions
      6m 40s
    3. Regular expression engines
      2m 44s
    4. Installing an engine
      4m 5s
    5. Notation conventions and modes
      3m 6s
  3. 21m 23s
    1. Literal characters
      6m 39s
    2. Metacharacters
      2m 1s
    3. The wildcard metacharacter
      4m 31s
    4. Escaping metacharacters
      4m 53s
    5. Other special characters
      3m 19s
  4. 31m 26s
    1. Defining a character set
      5m 49s
    2. Character ranges
      4m 49s
    3. Negative character sets
      4m 53s
    4. Metacharacters inside character sets
      5m 12s
    5. Shorthand character sets
      6m 30s
    6. POSIX bracket expressions
      4m 13s
  5. 36m 38s
    1. Repetition metacharacters
      7m 17s
    2. Quantified repetition
      6m 59s
    3. Greedy expressions
      6m 27s
    4. Lazy expressions
      6m 46s
    5. Using repetition efficiently
      9m 9s
  6. 20m 24s
    1. Grouping metacharacters
      4m 14s
    2. Alternation metacharacter
      4m 54s
    3. Writing logical and efficient alternations
      7m 33s
    4. Repeating and nesting alternations
      3m 43s
  7. 19m 19s
    1. Start and end anchors
      7m 21s
    2. Line breaks and Multiline mode
      4m 41s
    3. Word boundaries
      7m 17s
  8. 23m 33s
    1. Backreferences
      8m 57s
    2. Backreferences to optional expressions
      3m 51s
    3. Finding and replacing using backreferences
      7m 16s
    4. Non-capturing group expressions
      3m 29s
  9. 32m 31s
    1. Positive lookahead assertions
      6m 39s
    2. Double-testing with lookahead assertions
      7m 16s
    3. Negative lookahead assertions
      6m 10s
    4. Lookbehind assertions
      6m 26s
    5. The power of positions
      6m 0s
  10. 13m 13s
    1. About Unicode
      4m 19s
    2. Unicode in regular expressions
      4m 41s
    3. Unicode wildcards and properties
      4m 13s
  11. 1h 55m
    1. How to use this chapter
      5m 38s
    2. Matching names
      6m 33s
    3. Matching postal codes
      8m 54s
    4. Matching email addresses
      5m 0s
    5. Matching URLs
      8m 1s
    6. Matching decimal numbers and currency
      6m 45s
    7. Matching IP addresses
      7m 10s
    8. Matching dates
      7m 49s
    9. Matching times
      8m 59s
    10. Matching HTML tags
      8m 34s
    11. Matching passwords
      6m 49s
    12. Matching credit card numbers
      9m 36s
    13. Finding words near other words
      6m 38s
    14. Formatting with Search and Replace, pt. 1
      7m 22s
    15. Formatting with Search and Replace, pt. 2
      4m 15s
    16. Formatting with Search and Replace, pt. 3
      7m 10s
  12. 47s
    1. Goodbye
      47s

Start learning today

Get unlimited access to all courses for just $25/month.

Become a member
Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ.

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Using Regular Expressions.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

Get started

Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferencesfrom the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Are you sure you want to delete this note?

No

Your file was successfully uploaded.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.