Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member

Backreferences

From: Using Regular Expressions

Video: Backreferences

In this chapter, we'll learn about capturing groups, and how to use backreferences to access those groups. If you remember back in Chapter 5 when we talked about grouping expressions by using parentheses, I mentioned that grouped expressions are captured by the regex engine. What this means is that as the regex engine is going through and finding matches, it stores the matched portion that's in parentheses. It bookmarks it, and remembers it for later. So for example, let's say we have a regex like A, open parentheses, P, and then the two quantifier after it, L, close parentheses, E.

Backreferences

In this chapter, we'll learn about capturing groups, and how to use backreferences to access those groups. If you remember back in Chapter 5 when we talked about grouping expressions by using parentheses, I mentioned that grouped expressions are captured by the regex engine. What this means is that as the regex engine is going through and finding matches, it stores the matched portion that's in parentheses. It bookmarks it, and remembers it for later. So for example, let's say we have a regex like A, open parentheses, P, and then the two quantifier after it, L, close parentheses, E.

Well, we already know, obviously, that's going to match apple, but at the same time that it's matching it, the regex engine also says, ah! Inside those parentheses, it matched P, P, L. I am going to store that, and remember it for later. Notice it stores the actual data that was matched; not the expression. So the actual match that it made, it stores that data from the string, so we can have it for later, and this happens automatically, and by default. It doesn't matter if you were using those parentheses for repetition, for alternation, or just helping keep your regex organized.

When the regex engine sees those parentheses, it captures the data that's in there for later. Of course, if it stores that data, then we need a way to be able to access it, and the way that we do that is using backreferences. Backreferences are the tool that allows us to access the captured data, and we can refer to those different backreferences with the syntax backslash, followed by a number. So the first backreference would be backslash one. So the metacharacters you would use for the backreferences would be backslash one, through backslash nine, and those would refer to backreference positions one to nine, referring back to the part that was captured, so we can use it again.

Now, there's typically two ways that you would use those backreferences. The first is as you would refer back in the same expression. So in the same expression where you captured the group, later on you refer back to what was captured earlier. Secondly, what you do is refer back to that captured data after the matching is complete. To do that, you really would need to be inside, like, a programming language, so that the matched data can be stored in a variable, and refer to those different matched portions. Or another place that it comes up is inside a text editor.

If you are doing Find and Replace, then the part that's captured during the Find action, can be referred to while you're doing to Replace action. One important note with these backreferences, though, is that they cannot be used inside character classes, and there is no real reason you would need to. They are kind of two fundamentally different concepts. Character classes is defining a set of characters to match, whereas this is talking about literal data that got matched, so the concepts really shouldn't overlap. Now, as far support for these goes, most regex engines support backslash one, through backslash nine.

Some regex engines actually support backslash ten, through backslash ninety-nine. I find that one through nine is usually enough, and I think it's best to try and stay within that limit, and then if you find that you really do need more than that for a special case, check to see if your platform supports it. There are some regex engines that instead of using the backslash followed by a number, they use dollar sign followed by a number. So if you find that the backslash backreference isn't working, you might just try that dollar sign, and see if that works for you instead. So let's see some concrete examples.

Let's see we've got a regex that is apples, being captured in a group, space, T, O, space, and then backslash, one; that is, a backreference to what was just captured, so that is a reference back to apples. So of course it matches apples to apples. It captured it and referred to it all in the same expression. It would not match apples to oranges; it would not match apple to apple. It would only match apples to apples. Of course, you could use multiple backreferences in the same expression. So if you had A, B; C, D; and E, F, each one being captured, then we can refer to those with backreference three, two, and one, in the reverse order, and that would match A, B; C, D; E, F; E, F; C, D; A, B.

Now, both of those examples are just using literal text, but we can put any expression inside that captured group. For example, let's put alternation in there. So let's say we are looking at an HTML document, and we are looking for any tag that is either an I tag, or an E, M tag. So you see there in the captured group in parentheses, I have got I, or E, M. Any text inside the tag, I have made it not greedy, so we don't accidentally skip over to the end of another tag. At the end, the closing tag should match.

So I have got the forward slash, and then followed by a backreference to whatever was matched before. So it matches hello inside I tags, and hello inside E, M tags. But it does not match hello inside some combination of I and E, M. It has to be the same tag both times. It's the literal text that got matched, not the expression. Let's try these out for ourselves. So let's start with apples to apples, and then for our regular expression, we are going to put in apples, we are going to capture that to, backslash one.

So it matched the whole thing. It doesn't match apple to apples, or any other combination, but it does match exactly this phrase. Let's try our second example: A, B; C, D; E, F; and then backslash 3, backslash 2, backslash 1. And then down here, A, B; C, D; E, F; E, F; C, D; A, B. We can put in more here. Let's say we had G, H, and I, J, and then make the reference here out of sequence.

After the E, F, it's expecting G, H, and I, J, and after the A, B, it's expecting G, H, and I, J. Let's try our example with the HTML tags. I am just going to paste in a couple here, and let's try crafting that real quick. So, we know that we need tags, and inside that tag, we have either I, or E, M. So it matched each one of those. And then after that, we don't care what it is, we'll put in our wildcard; we'll make it not greedy. If we take away the greediness, you will see how it stretches all the way to the end. Make it not greedy; let's put our first part of the closing tag there. See? We are still in good shape.

We know we need the forward slash, and then let's make our backreference 1. Now, notice what it matches here. Italics matched, emphasis matched, this one appears to have matched, but it didn't actually. Notice here that what it's doing is it's being greedy in order to get the match. It's saying, ah! I see an I tag here, and I see an I tag here. If we take either one of those out, then you will see that it doesn't match either of those. So it has to be the same text both times. And you could then come back, and you can continue to enhance this and say alright, I am looking for B tags, and strong tags.

Then it would respond to those as well. B for bold, and let's do strong; my forward slash there. Alright, I think that's a good practical example working with HTML tags. Let's try another one. Let's say that we had a list of people, and we wanted to find people whose names were repeated in their last name. So, for example, John Johnson. John occurs twice in it. So what we are going to do is we are going to say, anytime we have a word boundary, and then we know that the first thing is going to be A to Z for a capital letter, followed by a to z lowercase letter, repeated, and then let's capture that whole thing.

That's their first name, followed by backslash b, for the word boundary. But what we are interested in is not just finding each of the names. We are interested in anything that has a space, followed by a word boundary, and then backslash one, and then son at the end. So now we have John Johnson, Evan Evanson, and you can put in other combinations here in place of son if you wanted other variations, but you can see now we have names that are repeated, like John Johnson, and Evan Evanson. But it did not pick up Eric Erikson, because of course, it's spelled differently.

Let's try one more useful example. Let's say we have a phrase like, Paris in the the spring. Maybe you've seen this before; this is an exercise in how your brain works. You don't notice the double words right away. Well, with regex, we can find those double words. Let's say we are looking for anything that starts of word boundary, and that is any word character, and then after that is a space that is repeated, and then that same word again, followed by another word boundary. We're capturing the word, so it's anything between a boundary, and some spaces -- we don't care how many spaces there are, the new line return doesn't matter -- and then, whatever that word is again gets repeated the second time.

This can be a very helpful regular expression that you can run on a document, and make sure that you didn't accidentally duplicate two words. So now I think you have a good understanding of the fundamentals of how captured groups, and backreferences to those captured groups, work. In the next movie, let's talk about how backreferences work with optional groups.

Show transcript

This video is part of

Image for Using Regular Expressions
Using Regular Expressions

59 video lessons · 11685 viewers

Kevin Skoglund
Author

 
Expand all | Collapse all
  1. 2m 18s
    1. Welcome
      56s
    2. Using the exercise files
      1m 22s
  2. 19m 55s
    1. What are regular expressions?
      3m 20s
    2. The history of regular expressions
      6m 40s
    3. Regular expression engines
      2m 44s
    4. Installing an engine
      4m 5s
    5. Notation conventions and modes
      3m 6s
  3. 21m 23s
    1. Literal characters
      6m 39s
    2. Metacharacters
      2m 1s
    3. The wildcard metacharacter
      4m 31s
    4. Escaping metacharacters
      4m 53s
    5. Other special characters
      3m 19s
  4. 31m 26s
    1. Defining a character set
      5m 49s
    2. Character ranges
      4m 49s
    3. Negative character sets
      4m 53s
    4. Metacharacters inside character sets
      5m 12s
    5. Shorthand character sets
      6m 30s
    6. POSIX bracket expressions
      4m 13s
  5. 36m 38s
    1. Repetition metacharacters
      7m 17s
    2. Quantified repetition
      6m 59s
    3. Greedy expressions
      6m 27s
    4. Lazy expressions
      6m 46s
    5. Using repetition efficiently
      9m 9s
  6. 20m 24s
    1. Grouping metacharacters
      4m 14s
    2. Alternation metacharacter
      4m 54s
    3. Writing logical and efficient alternations
      7m 33s
    4. Repeating and nesting alternations
      3m 43s
  7. 19m 19s
    1. Start and end anchors
      7m 21s
    2. Line breaks and Multiline mode
      4m 41s
    3. Word boundaries
      7m 17s
  8. 23m 33s
    1. Backreferences
      8m 57s
    2. Backreferences to optional expressions
      3m 51s
    3. Finding and replacing using backreferences
      7m 16s
    4. Non-capturing group expressions
      3m 29s
  9. 32m 31s
    1. Positive lookahead assertions
      6m 39s
    2. Double-testing with lookahead assertions
      7m 16s
    3. Negative lookahead assertions
      6m 10s
    4. Lookbehind assertions
      6m 26s
    5. The power of positions
      6m 0s
  10. 13m 13s
    1. About Unicode
      4m 19s
    2. Unicode in regular expressions
      4m 41s
    3. Unicode wildcards and properties
      4m 13s
  11. 1h 55m
    1. How to use this chapter
      5m 38s
    2. Matching names
      6m 33s
    3. Matching postal codes
      8m 54s
    4. Matching email addresses
      5m 0s
    5. Matching URLs
      8m 1s
    6. Matching decimal numbers and currency
      6m 45s
    7. Matching IP addresses
      7m 10s
    8. Matching dates
      7m 49s
    9. Matching times
      8m 59s
    10. Matching HTML tags
      8m 34s
    11. Matching passwords
      6m 49s
    12. Matching credit card numbers
      9m 36s
    13. Finding words near other words
      6m 38s
    14. Formatting with Search and Replace, pt. 1
      7m 22s
    15. Formatting with Search and Replace, pt. 2
      4m 15s
    16. Formatting with Search and Replace, pt. 3
      7m 10s
  12. 47s
    1. Goodbye
      47s

Start learning today

Get unlimited access to all courses for just $25/month.

Become a member
Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ.

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Using Regular Expressions.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

Get started

Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferencesfrom the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Are you sure you want to delete this note?

No

Your file was successfully uploaded.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.