Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member

Shorthand character sets

From: Using Regular Expressions

Video: Shorthand character sets

We learned how to define character sets, and then we saw how range can save us a lot of typing inside those character sets. Now we're going to learn to save even more typing by learning about shorthand character sets. Shorthand character sets all begin with a backslash, followed by a letter: d for digit, w for word character, and s for whitespace. Don't let that through you off, the fact that whitespace is actually s for space. w is for word character. Then we have the three capital versions as well--capital D, capital W, and capital S--which are the opposite--not a digit, not a word character, and not whitespace.

Shorthand character sets

We learned how to define character sets, and then we saw how range can save us a lot of typing inside those character sets. Now we're going to learn to save even more typing by learning about shorthand character sets. Shorthand character sets all begin with a backslash, followed by a letter: d for digit, w for word character, and s for whitespace. Don't let that through you off, the fact that whitespace is actually s for space. w is for word character. Then we have the three capital versions as well--capital D, capital W, and capital S--which are the opposite--not a digit, not a word character, and not whitespace.

You can see that in the table, I've written out the shorthand on the far left and the equivalent on the far right. Notice in the equivalent for whitespace that the whitespace is equal to a space, a tab, or line return; it's any of those things. Notice also that in the word character that it's upper- and lowercase letters as well as numbers and underscore. It's because regular expressions have their foundation in UNIX, and in UNIX it's very common to have underscores and numbers in a file name, so therefore they're allowed to be word characters here, even though we normally don't think about that.

So you can see that the shorthands are much shorter than writing the equivalent version, and that can save you a lot of time, and it can also help reduce mistakes. But it can also lead to mistakes if you don't think about the fact of what these are equivalent to, especially with that word character and the whitespace. Another important point about the W is that underscore is a word character, but hyphen is not a word character; it's considered punctuation. So be careful about that and don't let that throw you off either. Underscore is included, hyphen is not, and also, digits are included as well.

Let's take a look at some examples. So if we have the shorthand for digit four times, then that's a four-digit number. It matches 1984, but it does not match a four-letter word, like text. Now if we have the word character shortcut three times, that would match ABC, which makes sense that seems like a word character, but it also matches 123, which you would think, oh wait, that was digits--it should be a different thing. No, it's still included here, as well as 1_A, which looks nothing like a word, but still, it's made up of what are considered word characters.

Be careful with that. And then I've showed you how to use the space character. I've got a word character, followed by a space character, followed by two word characters, which matches I am, but not Am I. Now of course, if you really wanted just a real space, there is no need to use the special character; you can just put a literal space in there and that'll match. This is only if it could also be a tab or line return in there. If you want to allow those possibilities then you use this. You can also put these character sets inside a character set.

So for example, we could have a character set that is made up of any word character and the hyphen. That can be very useful for looking to include hyphenated words in our matches. We can also say well, we're looking for any one character that is a digit character or a whitespace character, so we can combine those side by side. We can also use negation with them, so we could have anything that's not a digit. Of course that's the exact same thing as if we'd use the not-a-digit shorthand or written it out the long way. A word of caution here though about using negatives, especially when we already have these negative short hands: if we use a character set where we negate a digit or whitespace, that's not the same thing as a character set that is not a digit or not a whitespace.

Let me break it down and show you the difference. The first one says the whole thing is negated; it's not a digit or whitespace character. The second one says either. It's either not a digit or it's not a space character. The negative is applied differently in each case. We'll take a closer look at that in the examples. Now as far support for the shorthand character sets, in general, they're going to be in all regular expression engines. They started out with Perl and they really spread very quickly, because they're so useful, to all modern regex engines, so all languages are going to support them.

A lot of the older UNIX tools are not, so you'll have to write these out the long way; you can't use the shorthand. So for that reason, if you're writing a regular expression that really needs to be portable and needs to be able to be used in the UNIX environment or inside a UNIX tool then you will want to write it out the long way. But if you're just using it inside your programming language--let's say you're programming something in Perl--then go ahead and feel free to use these shorthands. Alright, let's try this out in regexPal. So to start with, let's do our four digits--d, d, d, d--and that should match 1984, but it should not match text. Nothing special there.

Let's try up here, let's put our w, w, w. Now notice this matched both the digits-- and I'll go and put a forth one-- it matched both the digits and it matched the word. If we had something here that was 1_5W, it matches that too. Let's talk about hyphens for a second. Let's say we had blue-green paint, right, and we want to match that. If we just had a W, it matches everything but the hyphen and the space there.

If we wanted to include that hyphen then you make a character set and in that character set we need to put our hyphen and remember, we want to escape the hyphen whenever it's in a character set as well. So now it is says anything that's a word character or a hyphen. Let's try another character set, let's put down here 123456789 and then abc, so now up here, let's put in our character set a digit or a space. Anything that's a digit or a space is what we want to match. So you can see it did that, and it did not match abc.

If we said we want anything that is not a digit or space, you can see we get the opposite of that right. It negated it, and it gave us what we would expect: a, b and c. Now to show you the difference, let's look at this one. What we're saying here is anything that is either not a digit or not a space. So it goes to the first one. It says all right, is this number one right here? Is that not a digit? No it's not. But is it not a space? Oh yes, it's not space, so it's a match. Then it goes to the next one, number 2, same thing. This is not a space; it qualifies.

When it gets to the space between them it say ah! This is not a digit; it qualifies. So you see how the negation there is applied differently between the two. I think most of the time you will just find yourself using the lowercase ones anyway, but if you do start to use those uppercase ones and you start to combine them together, just take that extra step to logically think through what it's going to match, and try it on a few samples to make sure you've got it right.

Show transcript

This video is part of

Image for Using Regular Expressions
Using Regular Expressions

59 video lessons · 12260 viewers

Kevin Skoglund
Author

 
Expand all | Collapse all
  1. 2m 18s
    1. Welcome
      56s
    2. Using the exercise files
      1m 22s
  2. 19m 55s
    1. What are regular expressions?
      3m 20s
    2. The history of regular expressions
      6m 40s
    3. Regular expression engines
      2m 44s
    4. Installing an engine
      4m 5s
    5. Notation conventions and modes
      3m 6s
  3. 21m 23s
    1. Literal characters
      6m 39s
    2. Metacharacters
      2m 1s
    3. The wildcard metacharacter
      4m 31s
    4. Escaping metacharacters
      4m 53s
    5. Other special characters
      3m 19s
  4. 31m 26s
    1. Defining a character set
      5m 49s
    2. Character ranges
      4m 49s
    3. Negative character sets
      4m 53s
    4. Metacharacters inside character sets
      5m 12s
    5. Shorthand character sets
      6m 30s
    6. POSIX bracket expressions
      4m 13s
  5. 36m 38s
    1. Repetition metacharacters
      7m 17s
    2. Quantified repetition
      6m 59s
    3. Greedy expressions
      6m 27s
    4. Lazy expressions
      6m 46s
    5. Using repetition efficiently
      9m 9s
  6. 20m 24s
    1. Grouping metacharacters
      4m 14s
    2. Alternation metacharacter
      4m 54s
    3. Writing logical and efficient alternations
      7m 33s
    4. Repeating and nesting alternations
      3m 43s
  7. 19m 19s
    1. Start and end anchors
      7m 21s
    2. Line breaks and Multiline mode
      4m 41s
    3. Word boundaries
      7m 17s
  8. 23m 33s
    1. Backreferences
      8m 57s
    2. Backreferences to optional expressions
      3m 51s
    3. Finding and replacing using backreferences
      7m 16s
    4. Non-capturing group expressions
      3m 29s
  9. 32m 31s
    1. Positive lookahead assertions
      6m 39s
    2. Double-testing with lookahead assertions
      7m 16s
    3. Negative lookahead assertions
      6m 10s
    4. Lookbehind assertions
      6m 26s
    5. The power of positions
      6m 0s
  10. 13m 13s
    1. About Unicode
      4m 19s
    2. Unicode in regular expressions
      4m 41s
    3. Unicode wildcards and properties
      4m 13s
  11. 1h 55m
    1. How to use this chapter
      5m 38s
    2. Matching names
      6m 33s
    3. Matching postal codes
      8m 54s
    4. Matching email addresses
      5m 0s
    5. Matching URLs
      8m 1s
    6. Matching decimal numbers and currency
      6m 45s
    7. Matching IP addresses
      7m 10s
    8. Matching dates
      7m 49s
    9. Matching times
      8m 59s
    10. Matching HTML tags
      8m 34s
    11. Matching passwords
      6m 49s
    12. Matching credit card numbers
      9m 36s
    13. Finding words near other words
      6m 38s
    14. Formatting with Search and Replace, pt. 1
      7m 22s
    15. Formatting with Search and Replace, pt. 2
      4m 15s
    16. Formatting with Search and Replace, pt. 3
      7m 10s
  12. 47s
    1. Goodbye
      47s

Start learning today

Get unlimited access to all courses for just $25/month.

Become a member
Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ .

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Using Regular Expressions.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member ?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

Get started

Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferences from the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Learn more, save more. Upgrade today!

Get our Annual Premium Membership at our best savings yet.

Upgrade to our Annual Premium Membership today and get even more value from your lynda.com subscription:

“In a way, I feel like you are rooting for me. Like you are really invested in my experience, and want me to get as much out of these courses as possible this is the best place to start on your journey to learning new material.”— Nadine H.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.