Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member

Word boundaries

From: Using Regular Expressions

Video: Word boundaries

In this movie, we are going to learn about another kind of anchored expression, and that is using word boundaries. The metacharacters we are going to use are the lowercase b, and the uppercase B, with a backslash in front of them. Lowercase b is a word boundary; that is the start or the end of a word. The uppercase B is not a word boundary. And that's the same pattern we have seen before, like when we had word character. Lowercase w was a word character; the uppercase W was not a word character. Just like the other anchored expressions we have seen, they reference a position, not an actual character.

Word boundaries

In this movie, we are going to learn about another kind of anchored expression, and that is using word boundaries. The metacharacters we are going to use are the lowercase b, and the uppercase B, with a backslash in front of them. Lowercase b is a word boundary; that is the start or the end of a word. The uppercase B is not a word boundary. And that's the same pattern we have seen before, like when we had word character. Lowercase w was a word character; the uppercase W was not a word character. Just like the other anchored expressions we have seen, they reference a position, not an actual character.

What decides whether or not something is the boundary is based on a couple of conditions. It is the first word character in the entire string, so that's the first boundary you are going to have, and at the end of the string, the very last word character is going to get boundary as well, and then after that, in between those two, every single time that it shifts between a word character, or a non-word character, we're going to have another boundary. Remember, word characters are the capital letters A to Z, lowercase a to z, 0 to 9, and the underscore.

Same thing as that shorthand character class, backslash w. So anytime we switch between one of these things and something that's not one of these things, we have got another boundary. Support for these metacharacters is going to be in most regular expression engines, but not in the really early UNIX tools; the BRE's. Essentially what that means is you can use in egrep, but not in grep. Let's take a look at some examples. Just finding a simple word we might say, alright, we have a boundary, followed by some word characters, followed by another boundary.

And when we apply that to a sentence, it will find four matches in the string: this is a test. This, is one of them; is, a, and test. No spaces, no punctuation gets matched; just the word characters with the boundaries on either side. If we had a abc_123, well they would match the whole thing, because remember, underscore, and 1, 2, 3 are word characters. But in top notch, well in that case there's actually two words, and four boundaries. There is a boundary before the T, there is a boundary after the P, a boundary before the N, and a boundary after the H. So we come up with two words: top, and notch.

Now, those are boundary examples. You can also have not a boundary. These aren't necessarily quite as useful as the boundary once, but sometimes it can be. If we wanted to find capital This, when it was not at a boundary for some reason -- there was something in front of it -- then we would use the capital B, and we would not have a match in the case of just this is a test, because the first character in the string is counted as a boundary. It would find two matches if we used that same pattern matching of Backslash W with the plus sign. Those two matches would be H and I inside this, because neither of those characters has a boundary on either side of it.

Every other character does have a boundary on one side or the other, and E and S in test. Let's take a look in RegexPal. I'm going to paste in a Shakespeare sonnet here, just so we have some text to work with. This is in the exercise files. And then, for a word, let's just try that simple one. We are looking for any word character repeated, with boundaries on either side. So you can see what it picked out as being the words that have the boundaries there. Each of the words -- not the spaces, not the punctuation; those are not counted.

Now, if we wanted those, we could just put square brackets around this. Let's say, for example, we wanted summer's to be included, so we put an apostrophe in there. It is including it in our match. It still doesn't mean there's not a word boundary there. There is still a word boundary after the word summer, and then another one after the apostrophe before the letter s. How do we know that? Well let's make it not greedy. If we put not greedy after it, you can see now it didn't count it, but when it's greedy it said ah! I'll go ahead and just keep consuming things, and ignore word boundaries, and keep consuming characters as a match to my pattern until I run out of things that are word character, and then I'll check to see if there's a word boundary.

This way it keeps checking constantly; every single time it consumes another character, that lazy expression makes that check again. Let's just try another example that we've seen before. Let's take this back out. For starters, let's just do this, with an s after it, and let's do We picked apples. We did that before, and it matches all plural words; everything that's word characters, followed by a literal S. Before, we talked about the efficiency of that, and the efficiency of using repetition. One way we can really improve the efficiency of that is put the boundaries on either side of it. Say we are looking for a whole word; don't look for things that are partial words, don't waste your time with those, only zero in straight on the whole word, and that does give us quite a bit of speed improvement.

Let's look at it. So, if the parser is going through the sentence, We picked apples, it starts with the W, and it says alright, I've got my word boundary. That condition's met. Do I have my second condition? The second expression, which is this repeated word character -- I do. Now it goes the E; that's a repeated word character. Now it goes to the space, and says alright, that is not a word character. It jumps to the next character, and it says it's not an S, so it fails to match, and it backtracks. But the E; it says alright, do we have a word boundary here? And actually, we don't really have one there. The word boundary really occurs right after that; there's an ending word boundary, so it actually waits until it gets the next character. Right there it says alright, we have got a word boundary, but this is not a word character, so it keeps moving until finally it finds the condition again where it's a word boundary, followed by word character; picked.

It works its way along, just like we saw before. This time, when it gets to the space it says, this is not a word character; this is not an S. So it does backtrack to the I, just like it did before. No difference here, but here is where it changes. It's not trying to match a word character; it's looking for a word boundary. No word boundary, no word boundary, no word boundary, no word boundary, no word boundary, ah! I have a word boundary, but I don't have a word character. Now I have a boundary, followed by a word character again.

Do you see all that backtracking that it skipped? It no longer backtracked and tried icked, cked, ked, ed, d; it left all that out, and just went 'til it found the start of the next word. Much more efficient. And then, of course, it works its way along until it matches apples. Now, there is one important word of caution that I want to give you, and that is that a space is not a word boundary. In regular grammar, the purpose of having a space in a sentence is to denote the boundary between words so that all the words don't just run together.

But in regular expressions, that's not the way it is. A word boundary references a position, not an actual character; it doesn't represent that space. So, for example, if we had the string apples, space, and, space, oranges; It does not match if we have apples, boundary, and, boundary, oranges: that is not a match. The way it would match would be if you had apples, boundary, space, boundary, and, boundary, space, boundary, oranges.

It's easy for us to think of spaces as being boundaries, but a word boundary is something different in regular expressions. It really is the point at which it switches from a non-word character to a word character.

Show transcript

This video is part of

Image for Using Regular Expressions
Using Regular Expressions

59 video lessons · 11697 viewers

Kevin Skoglund
Author

 
Expand all | Collapse all
  1. 2m 18s
    1. Welcome
      56s
    2. Using the exercise files
      1m 22s
  2. 19m 55s
    1. What are regular expressions?
      3m 20s
    2. The history of regular expressions
      6m 40s
    3. Regular expression engines
      2m 44s
    4. Installing an engine
      4m 5s
    5. Notation conventions and modes
      3m 6s
  3. 21m 23s
    1. Literal characters
      6m 39s
    2. Metacharacters
      2m 1s
    3. The wildcard metacharacter
      4m 31s
    4. Escaping metacharacters
      4m 53s
    5. Other special characters
      3m 19s
  4. 31m 26s
    1. Defining a character set
      5m 49s
    2. Character ranges
      4m 49s
    3. Negative character sets
      4m 53s
    4. Metacharacters inside character sets
      5m 12s
    5. Shorthand character sets
      6m 30s
    6. POSIX bracket expressions
      4m 13s
  5. 36m 38s
    1. Repetition metacharacters
      7m 17s
    2. Quantified repetition
      6m 59s
    3. Greedy expressions
      6m 27s
    4. Lazy expressions
      6m 46s
    5. Using repetition efficiently
      9m 9s
  6. 20m 24s
    1. Grouping metacharacters
      4m 14s
    2. Alternation metacharacter
      4m 54s
    3. Writing logical and efficient alternations
      7m 33s
    4. Repeating and nesting alternations
      3m 43s
  7. 19m 19s
    1. Start and end anchors
      7m 21s
    2. Line breaks and Multiline mode
      4m 41s
    3. Word boundaries
      7m 17s
  8. 23m 33s
    1. Backreferences
      8m 57s
    2. Backreferences to optional expressions
      3m 51s
    3. Finding and replacing using backreferences
      7m 16s
    4. Non-capturing group expressions
      3m 29s
  9. 32m 31s
    1. Positive lookahead assertions
      6m 39s
    2. Double-testing with lookahead assertions
      7m 16s
    3. Negative lookahead assertions
      6m 10s
    4. Lookbehind assertions
      6m 26s
    5. The power of positions
      6m 0s
  10. 13m 13s
    1. About Unicode
      4m 19s
    2. Unicode in regular expressions
      4m 41s
    3. Unicode wildcards and properties
      4m 13s
  11. 1h 55m
    1. How to use this chapter
      5m 38s
    2. Matching names
      6m 33s
    3. Matching postal codes
      8m 54s
    4. Matching email addresses
      5m 0s
    5. Matching URLs
      8m 1s
    6. Matching decimal numbers and currency
      6m 45s
    7. Matching IP addresses
      7m 10s
    8. Matching dates
      7m 49s
    9. Matching times
      8m 59s
    10. Matching HTML tags
      8m 34s
    11. Matching passwords
      6m 49s
    12. Matching credit card numbers
      9m 36s
    13. Finding words near other words
      6m 38s
    14. Formatting with Search and Replace, pt. 1
      7m 22s
    15. Formatting with Search and Replace, pt. 2
      4m 15s
    16. Formatting with Search and Replace, pt. 3
      7m 10s
  12. 47s
    1. Goodbye
      47s

Start learning today

Get unlimited access to all courses for just $25/month.

Become a member
Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ.

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Using Regular Expressions.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

Get started

Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferencesfrom the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Are you sure you want to delete this note?

No

Your file was successfully uploaded.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.