Join Michael Murphy for an in-depth discussion in this video Building with wild cards, part of Learning GREP with InDesign.
There are many different kinds of GREP metacharacters. Some describe specific characters while others indicate how often certain characters or portions of an expression repeat. Some describe locations and others apply conditions to searches. In InDesign, these different metacharacter types are grouped together logically in the Special Characters menu. In this movie, we're going to take a look at the first group of metacharacters called Wildcards. I'm zoomed in on the second page of this layout so that we can better see the text on the screen and my cursor is inside of this body copy text.
In the Paragraph Styles panel, you can see the Body Text, which is the style in use, is highlighted. I'm going to right-click on that style name or Ctrl+Click on the Mac and choose Edit "Body Text". This opens the Paragraph Style Options dialog and I'm going to go to the GREP style area. I'm using GREP styles to demonstrate these metacharacters because it's the only way in InDesign to actually preview the results of your GREP expressions as you build them. It's very helpful to do that as you're learning this. But in order to make that work, you need to make sure that the Preview checkbox at the bottom is actually checked on.
Now the GREP Style work area doesn't look like a whole lot, on first glance. It's just a big area with a new GREP Style button at the bottom. I'm going to click New GREP Style and my options here are to apply a style, which right now is set at None. If I click None, you'll see that it's actually a pull-down menu of all of the available character styles in this document. If I need to create a new character style, CS4 allows me to do that with this option at the bottom. In this instance, for this example, I'm going to use just this Yellow Highlight style that's already built into the document.
Below that is the To Text field. If I click on that, instantaneously, because Preview is checked, you'll see that this default expression, which is actually any digit one or more times, is automatically applied to the text on the page. You can see all of the numbers are highlighted with this Yellow Highlight character style, which basically just puts a thick underline below the text. Now, this default cannot be changed, unfortunately, so I need to clear it out each time I start a new GREP Style.
As soon as I click off in this gray area, that commits the change. So I can see the results on the page. I can still cancel. I don't have to save these changes to the style. Nothing is actually saved until I click OK. But by switching back and forth between the To Text field and clicking in this area, I can see what's going on, on the page, as I build my GREP expression. We're going to take a look at each of the Wildcard metacharacters in the Special Characters menu. At the end of the field, I'll click the Special Characters menu icon, and I'm going down to Wildcards.
Any digit we've already seen. It's \d, and if I click in the gray area you can see what we saw before, that those numbers are highlighted. I'll click back in the To Text field, clear that out by hitting Delete. I'm going to go back to the Special Characters menu at the end of the field, to the Wildcards submenu, and I'm going to skip Any Letter for now because it's a unique one that we're going to deal with at the end. I'm going to go to Any Character, which is the broadest of all the wildcard metacharacters.
It's the most far reaching. When I select that, I get a period, meaning any character. If I click off in the gray area to preview the results, you can see it matches exactly that, any character, punctuation, uppercase and lowercase letters, digits, spaces, everything except a hard return. That's the only thing outside of the scope of the Any Character metacharacter. I'm just going to clear that metacharacter out of the To Text field, and let's take a look at the next wildcard in the list.
Under the Wildcards submenu, I'm going to choose Any White Space. Fortunately, because I'm using a character style that actually highlights the space occupied by the character with a yellow underline, this will actually show up. If I choose Any White Space, I get the \s metacharacter. I'll click in the gray area and you can see it matches Any White Space. If I zoom in here, you can see that around this em dash, I'm using two InDesign custom white spaces.
They're little bit thinner than the standard Spacebar white space. These are thin spaces, and they also match, because Any White Space is matched. In fact, I'm going to click OK while this is still part of the style. I'm going to go on to the page and come over to the beginning of one of these paragraphs and I'm just going to type in a Tab. A Tab also qualifies as a white space. You'll notice that as soon as I typed it, the character style was automatically applied. This is what GREP styles do. As I add more spaces, since Any White Space is what's being styled, every white space that I add is automatically and dynamically styled with that character style.
I don't actually need these, undo that change altogether and go back to the Body Text style in the Paragraph Styles panel, right-click on it, choose Edit "Body Text" and let's take a look at the next wildcard. Once again, I'll create a new GREP style, since I undid the last one. I'm going to choose Yellow Highlight again. This is the default, unfortunately, that we're going to encounter every time we start one of these. I'll clear that out, and I'm going to choose the next wildcard in the list, which is Any Word Character.
Any Word Character applies the style that you choose to Any Uppercase Letter, Any Lowercase Letter, Any Digit, and the underscore character. So let's see what we get when we chose that. \w is the metacharacter itself. If I click off in the gray area, let's zoom out a little bit, so you can see some more here. You can see that it matches exactly that. Any Word Character, uppercase or lowercase letter, number or underscore, but it doesn't match white spaces, or punctuation, and none of those are highlighted here.
I'll clear out that Any Word Character wildcard and we'll take a look at the next wildcard in the list. That is Any Uppercase Letter, which is pretty straightforward. It's \u. When I click off the gray area, you can see that it matches every uppercase letter, all of the ones at the beginning of sentences, the complete uppercase sentence. There's one little thing that you do need to be aware of when you choose this. I'm going to zoom in down here where this ligature is and you'll notice that InDesign is considering the whole ligature, both letters, as a single uppercase character even though part of it is actually a lowercase character.
InDesign has ligatures turned on, by default, and this is how GREP treats ligatures when it encounters them. Even though you can still select them as separate characters, they're considered by GREP to be one character. So this is just something that you need to be aware of. I'm going to go back here and I'm going to choose the next wildcard from the list. I'll clear that one out. From the Wildcards menu, I'll choose Any Lowercase Character. Again, fairly self-explanatory, the metacharacter is \l. If I click in the gray area, you can see that that's what it matches.
Let me zoom out here and I've matched all of my lowercase characters. You'll also notice that that ligature is not selected at all, even the h portion of it. So this treatment of ligatures by Any Uppercase Letter and Any Lowercase Letter metacharacters works both ways. I'm going to clear that out and click off here so that all of that formatting is removed. We're going to go back to the one item in that wildcard list that I skipped, and that is the Any Letter metacharacter.
If I select that, you'll notice that the syntax here is quite different. It's actually the Any Lowercase and Any Uppercase Letter metacharacters enclosed in square brackets. Anything enclosed in square brackets is what's called a character set. So this is technically not a wildcard by itself. It's two wildcards combined together in a character set, which we'll discuss in the later movie. That's been included here in InDesign, because it's a very convenient choice.
It's useful to have, but it's actually not a true and unique wildcard. So wildcards stand in for particular types of characters from something as broad as Any Character to more specific like Any Digit or Any Lowercase Character. Except for that Any Letter option, those are all of the documented InDesign wildcards you'll see available in this list. However, there are few wildcards that are not in the Special Characters menu, nor are they listed in the Help files.
We'll take a look at some of those hidden gems, in the next movie.
- Using metacharacters, the building blocks of GREP
- Describing text that may not exist with zero operators
- Applying multiple character styles to the same text with GREP styles
- Eliminating orphaned words at the ends of paragraphs
- Preserving and recalling subexpressions
- Customizing a GREP-based text cleanup script for long documents
Skill Level Intermediate
Q: In the “Dynamically fixing orphaned words with GREP” tutorial the author uses the term:
In an earlier course the author described the + (one or more) modifier as unusable in a lookbehind or lookahead i.e. (?<=.+). What's the difference here?
A: The limitation mentioned in an earlier movie referred only to positive lookbehind and negative lookbehind. I was able to use the one or more times (+) metacharacter in the positive lookahead portion of the expression because that limitation doesn't affect either positive or negative lookahead. It's only when looking backward that GREP ignores the repeat metacharacters.