Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member

Finding words near other words

From: Using Regular Expressions

Video: Finding words near other words

In this movie, we are going to use regular expressions to help us identify words when they're in close proximity to other words inside a text. This is more powerful than a typical Find & Replace, where we can simply say find this word when it's in front of this word. Now we can say find it, and there may be some space in between. There may be a few other words thrown in there. We want to know if it's nearby. So to start with, let's get a text to work with. In the exercise files I have given you Ralph Waldo Emerson's Self-Reliance text. I'm going to put it in RegexPal, and we are not going to use our multi-line anchors this time like we've been doing for all the other ones, because we are going to be checking for things that are not anchored.

Finding words near other words

In this movie, we are going to use regular expressions to help us identify words when they're in close proximity to other words inside a text. This is more powerful than a typical Find & Replace, where we can simply say find this word when it's in front of this word. Now we can say find it, and there may be some space in between. There may be a few other words thrown in there. We want to know if it's nearby. So to start with, let's get a text to work with. In the exercise files I have given you Ralph Waldo Emerson's Self-Reliance text. I'm going to put it in RegexPal, and we are not going to use our multi-line anchors this time like we've been doing for all the other ones, because we are going to be checking for things that are not anchored.

So you can check for any two words inside this text. The words that I am going to look for is occurrences of A -- just little simple letter A as a single word -- and man. So any time we have a man. So we want it to find a man, but we also want it to find here where we have a perfect man, and we have the word perfect in between. So let's write a regular expression to do that. Let's put in A, and I am going to put in a character set here, because it can be capital or lowercase. And then after that, wildcard. Wildcard with zero or more characters. And then man also can be uppercase or lowercase; it can also be capitalized or not, and now you can see that it made a bunch of matches for me.

You can scroll through these, and see all of the matches that it made. Why did I use the star here? You don't have to; you could use the plus sign. If we were looking for two words that could be right next to each other, like peanut butter, and we were looking for peanut, and butter, and we cared that there could be nothing in between them, well then you'd want to use the star. I think most times for what we were doing here, though, we're not looking for A, M, A, N, when it's all run together, we can use the plus sign. We can also make it a little more readable by using grouping, just to separate those words out. And if you do that, you also, of course, want to make this a non-capturing group, because that's going to be a little more efficient.

We could use the Dot matches all mode. Here we have got this dot repeating. There is Dot matches all, which we can just click this checkbox for; it's the S modifier. And if you remember, that wraps across line breaks as well. So that means that the dot can now match a line return, otherwise it typically doesn't. Now once we turn that on, we see another problem, which you may have noticed before, which is that it's actually matching from the first A that it finds, all the way down to the last man that it finds. Here it is, because it's being greedy. We're seeing greediness in action here.

So we need to make this not greedy, and that will make it find the next occurrence of that second word -- in our case man -- that it can find. It's also finding A when it's not just by itself. We can use word boundaries to further improve it. Backslash, B, and backslash, B here, and backslash, B, man, and backslash at the end. Now it's finding just when it's a whole word. Now, obviously that wouldn't find peanut butter anymore, but we had already made that choice, and decided that we didn't care about finding those when they were right next to each other.

If you really did need that behavior, you could always put an alternation in here, and say that it's also possible to find the two words when there are immediately side by side. So that does it. Let's scroll down here, and let's look at our list. We have got a perfect man, down here is a man, a certain alienated majesty blah, blah, blah. Boy! That's a long one until we finally get down here for man. I probably don't actually want this Dot matches all, so let's take that away. That at least makes it a little better. Now it found this a man, and this one here. Scroll down; a divine man, a dinner, and -- boy, that's a lot of stuff before we finally get down here to this man.

Let's further improve it, so that we don't get that long thing. We could say that this could be any character except something like a period, or comma, or semicolon, and so on, and now it no longer finds that match. It does still find our other matches up here. Another improvement we can make to it, is to say, well we want to find any time it's within a certain number of characters. Right now we said we don't care how many characters are in between. We could put a quantifier on this, and say, well it actually can be between 1 and 20 characters long.

So any time it's between 1 and 20 characters, we'll find it. Or you could adjust it then, and say, oh you know what? Actually let's extend to 30 characters, or let's dial it back down to 10. That can control how much space is actually between these two words. Probably, though, we don't care about characters as much as we do, maybe, how many words are in between. So we could modify what's in between here, so that it would actually use words instead. So let's take all of this out here, and let's rethink this for a second. It's still going to be a non- capturing group, but the way that we identify a word is that it's a space, followed by word characters, followed by either a hyphen, or a space.

Now, you could include punctuation in there if you wanted to allow it to cross those punctuation boundaries. So now we've defined a word; now what we want is to repeat it. I am actually going to take the space out of the front, and put the space here, so it's going to be A, space, and then a word with a space or a hyphen, and then that will be repeated, and we can make it repeated, let's say it could be 0 to 5 times. So there can be up to 5 words, now, in between our two target words. And again, you can adjust this, and dial it down, and say, alright; I only want to allow one word in between, or I only want to allow three words in between.

Let's just try a perfect old man, and then we can see the difference. We do 2, and we do 1; now suddenly it's now allowed. We have to have 2 inside there. And then last of all, remember that we can use lookahead assertions to control what actually gets matched. So, for example, if we are interested in matching the A here, and that's what we really want to focus in on, then everything that comes after it, we can just wrap all of this inside a lookahead assertion. Question mark, equals; there we go, and now it matches just the A. You can also use captures if you want to capture just certain parts of it to be able to work with it during a Find & Replace.

Now here is the thing; this checks for A when it comes before man, but if you are looking for the opposite; if we wanted to find man before it came for A, or word two in front of word one, you'd want to flip the order around and search again, because it's very hard, almost impossible, to use lookbehind assertions in this case, because the length of the words in between there is indeterminate. And one of the restrictions on using lookbehind assertions in most regular expression engines is that it can't use variable lengths. So because it's an indeterminate length, we won't be able to use lookbehind expressions.

So if you wanted to match the other way around, you just have to flip the words, and search a second time. I think this can be a very powerful and useful technique. Don't think that it's just for essays either; it can also apply to searching inside code as well.

Show transcript

This video is part of

Image for Using Regular Expressions
Using Regular Expressions

59 video lessons · 12255 viewers

Kevin Skoglund
Author

 
Expand all | Collapse all
  1. 2m 18s
    1. Welcome
      56s
    2. Using the exercise files
      1m 22s
  2. 19m 55s
    1. What are regular expressions?
      3m 20s
    2. The history of regular expressions
      6m 40s
    3. Regular expression engines
      2m 44s
    4. Installing an engine
      4m 5s
    5. Notation conventions and modes
      3m 6s
  3. 21m 23s
    1. Literal characters
      6m 39s
    2. Metacharacters
      2m 1s
    3. The wildcard metacharacter
      4m 31s
    4. Escaping metacharacters
      4m 53s
    5. Other special characters
      3m 19s
  4. 31m 26s
    1. Defining a character set
      5m 49s
    2. Character ranges
      4m 49s
    3. Negative character sets
      4m 53s
    4. Metacharacters inside character sets
      5m 12s
    5. Shorthand character sets
      6m 30s
    6. POSIX bracket expressions
      4m 13s
  5. 36m 38s
    1. Repetition metacharacters
      7m 17s
    2. Quantified repetition
      6m 59s
    3. Greedy expressions
      6m 27s
    4. Lazy expressions
      6m 46s
    5. Using repetition efficiently
      9m 9s
  6. 20m 24s
    1. Grouping metacharacters
      4m 14s
    2. Alternation metacharacter
      4m 54s
    3. Writing logical and efficient alternations
      7m 33s
    4. Repeating and nesting alternations
      3m 43s
  7. 19m 19s
    1. Start and end anchors
      7m 21s
    2. Line breaks and Multiline mode
      4m 41s
    3. Word boundaries
      7m 17s
  8. 23m 33s
    1. Backreferences
      8m 57s
    2. Backreferences to optional expressions
      3m 51s
    3. Finding and replacing using backreferences
      7m 16s
    4. Non-capturing group expressions
      3m 29s
  9. 32m 31s
    1. Positive lookahead assertions
      6m 39s
    2. Double-testing with lookahead assertions
      7m 16s
    3. Negative lookahead assertions
      6m 10s
    4. Lookbehind assertions
      6m 26s
    5. The power of positions
      6m 0s
  10. 13m 13s
    1. About Unicode
      4m 19s
    2. Unicode in regular expressions
      4m 41s
    3. Unicode wildcards and properties
      4m 13s
  11. 1h 55m
    1. How to use this chapter
      5m 38s
    2. Matching names
      6m 33s
    3. Matching postal codes
      8m 54s
    4. Matching email addresses
      5m 0s
    5. Matching URLs
      8m 1s
    6. Matching decimal numbers and currency
      6m 45s
    7. Matching IP addresses
      7m 10s
    8. Matching dates
      7m 49s
    9. Matching times
      8m 59s
    10. Matching HTML tags
      8m 34s
    11. Matching passwords
      6m 49s
    12. Matching credit card numbers
      9m 36s
    13. Finding words near other words
      6m 38s
    14. Formatting with Search and Replace, pt. 1
      7m 22s
    15. Formatting with Search and Replace, pt. 2
      4m 15s
    16. Formatting with Search and Replace, pt. 3
      7m 10s
  12. 47s
    1. Goodbye
      47s

Start learning today

Get unlimited access to all courses for just $25/month.

Become a member
Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ .

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Using Regular Expressions.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member ?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

Get started

Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferences from the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Learn more, save more. Upgrade today!

Get our Annual Premium Membership at our best savings yet.

Upgrade to our Annual Premium Membership today and get even more value from your lynda.com subscription:

“In a way, I feel like you are rooting for me. Like you are really invested in my experience, and want me to get as much out of these courses as possible this is the best place to start on your journey to learning new material.”— Nadine H.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.