Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member

Matching names

From: Using Regular Expressions

Video: Matching names

In this movie, we're going to look at how to write a regular expression that will match names, and this will help us to illustrate some of the points that I was just talking about in the last movie. So, in order to match a simple list of names, we might think that the easiest solution is just to do any word character one or more times, and that does. That's the most permissive possible thing, and it allows all of our names to match, but we've got a couple problems there. The first is that it also returns a match if we have something like dollar sign Kevin. Now, it didn't match the entire thing, but it did find a match, and this can be a problem for us if we're doing something like form validation.

Matching names

In this movie, we're going to look at how to write a regular expression that will match names, and this will help us to illustrate some of the points that I was just talking about in the last movie. So, in order to match a simple list of names, we might think that the easiest solution is just to do any word character one or more times, and that does. That's the most permissive possible thing, and it allows all of our names to match, but we've got a couple problems there. The first is that it also returns a match if we have something like dollar sign Kevin. Now, it didn't match the entire thing, but it did find a match, and this can be a problem for us if we're doing something like form validation.

So we've got the characters in a form field, we submit those to our application, our Web application says, let me run this regular expression, and make sure it's valid. It says, do you find a match? And it says yes, I did find a match. Therefore, it thinks it's valid, and goes ahead and let's the data through. That's not what we really meant. What we really meant is we want only word characters; that symbols like this are not allowed. The way we should handle that is by using anchors at the beginning and end, or some other kind of delimiter, but in this case anchors. Notice now nothing matches; that's because I'm not looking at just a single field. I'm looking at multiple lines. I need to use multi-line anchors, so that those anchors will match at the beginning and ends of the lines.

So now it does correctly match the names, but it no longer matches the example with the dollar sign, but there's another problem here. It matches zero, Kevin as well. That's because, remember, the word character includes digits and underscores. So what we really mean, here, was to have uppercase A to Z, lowercase a to z, inside a character set. Now it makes sure that it only includes alphabetic characters. Now we have some choices to make. Does the first letter have to be capitalized? Right? Should this be a valid one? Right now it is.

If you want to have that first character capitalized, the best way to do that is just, at the beginning here, to say that the character set can be A to capital Z, followed by a mixture of upper and lower case letters. So now we should probably think about some edge cases. I mean, what if we had a name like J.R.? Is that allowed? It so, then we should add up here to our character set that the period is also valid. What about names that might have apostrophes in them? Or hyphenated names? These are the kinds of edge cases that you'll want to consider before you rule out the possibility that there won't be a single case that doesn't match your pattern.

Alright, so that pretty well matches first names. Let's now think about if we had more than one name. Let's say we have George Washington. Right; we no longer have a match there, because there's no space. We could just put a space in here, and say okay, think of this as a match, and it's a perfectly acceptable way to do it. But what if we wanted to actually capture these two? What if we wanted to grab the first name and the last name separately? Well, one way to do that would just to be to take this expression, copy it, and then put a space, and then repeat it, and now we can even capture first one, and capture the second one.

So now we've grabbed each one of those, and we can work with them using backreferences. We saw how to do that earlier. We worked with the Presidents' file, and we actually flipped their names around it, and put Washington in front of George. Okay, but then what about the case when we have a middle name? How should we handle that? John Quincy Adams is a President as well, but it no longer matches our pattern, because we've only allowed for one space to be in here, and this has two spaces. Well, it kind of depends on what your purposes are. If your purpose is just simply to grab Johnny Quincy, and throw it all in one field together, and you're okay with that, then you could put a space here, and just say, alright; if you encounter a first space, it belongs to the first name. There is no space allowed in the last name, so the only other space that's allowed is the one that's in between, delimiting the two. Therefore, John and Quincy become part of the first capture, and Adams becomes part of the second capture.

What if you wanted to capture it separately, though? What if you wanted to actually grab the middle name if it exists? Well, then we don't want to allow this there. Instead, what we want to do is we want to say that we have the same expression in the middle, and that will then match John Quincy Adams, but it doesn't match George Washington anymore. In the process, we have broken that one. Now, you might think, well let's just put an optional here. That will make that middle name optional, but it still didn't match it. Do you see why? It's because the space here -- if this is optional, we're saying that there should be two spaces, right? Space, space. And there is not space, space in between George Washington.

If we did add another space, look at that. Now it suddenly matches. So the problem is, so not only do we have our capture group for the middle name, but then we actually need to make what's optional the space as well. See how that works? It's a little awkward to read with the highlighting, but here's my capture group. It's just the middle name, and then what's optional is that same capture group, but with a space included. That's what's optional, and of course, it's a good idea to make that one non-capturing, so that what's really being captured is what we intend, and the optional grouping -- the one that includes that extra space -- is not being captured.

Now, you can take this further, and you can make more choices. International names probably don't fit these patterns at all. There is a possibility that someone could have comma, Junior, or comma, Senior after their last name. You might want to allow for those kinds of cases, but you get the idea. What I want you to see is that there's no single solution to a problem. It's always a set of judgment calls, and always a set of thinking about the data that you're actually trying to match, and what those edge cases are. And if you think about those things, you'll be able to use the basic regex rules that we've written to come up with something that will work for you.

I want to show you one last example before we leave names behind. What if we had the President Martin Van Buren? What would our regular expression do? It would work; it still does match, but it would capture Martin as being the first capture, Van as being the second capture, and Buren as being the last capture. That seems to be correct in terms of the regular expression that we've written. However, Van Buren is actually his last name. This is a fundamental problem in regular expressions. They're not going to be able solve all of your problems for you.

John Quincy Adams, Quincy is the middle name; Martin Van Buren, Van is part of the last name, but there's no way for it to tell the difference between those two. Now, you could try to write a regular expression where you made Van into a special exception, and said, if it's Van, then it get stuck in the last name. And maybe you could even go through and come up with a list of all of those possibilities of what could potentially be in that last name, but I think that's tough to do. More likely, if you're trying to do some kind of data processing, this just requires a human to go back at the end, and review what your regular expression did, and make sure that special cases like this didn't fall through the cracks.

Show transcript

This video is part of

Image for Using Regular Expressions
Using Regular Expressions

59 video lessons · 11667 viewers

Kevin Skoglund
Author

 
Expand all | Collapse all
  1. 2m 18s
    1. Welcome
      56s
    2. Using the exercise files
      1m 22s
  2. 19m 55s
    1. What are regular expressions?
      3m 20s
    2. The history of regular expressions
      6m 40s
    3. Regular expression engines
      2m 44s
    4. Installing an engine
      4m 5s
    5. Notation conventions and modes
      3m 6s
  3. 21m 23s
    1. Literal characters
      6m 39s
    2. Metacharacters
      2m 1s
    3. The wildcard metacharacter
      4m 31s
    4. Escaping metacharacters
      4m 53s
    5. Other special characters
      3m 19s
  4. 31m 26s
    1. Defining a character set
      5m 49s
    2. Character ranges
      4m 49s
    3. Negative character sets
      4m 53s
    4. Metacharacters inside character sets
      5m 12s
    5. Shorthand character sets
      6m 30s
    6. POSIX bracket expressions
      4m 13s
  5. 36m 38s
    1. Repetition metacharacters
      7m 17s
    2. Quantified repetition
      6m 59s
    3. Greedy expressions
      6m 27s
    4. Lazy expressions
      6m 46s
    5. Using repetition efficiently
      9m 9s
  6. 20m 24s
    1. Grouping metacharacters
      4m 14s
    2. Alternation metacharacter
      4m 54s
    3. Writing logical and efficient alternations
      7m 33s
    4. Repeating and nesting alternations
      3m 43s
  7. 19m 19s
    1. Start and end anchors
      7m 21s
    2. Line breaks and Multiline mode
      4m 41s
    3. Word boundaries
      7m 17s
  8. 23m 33s
    1. Backreferences
      8m 57s
    2. Backreferences to optional expressions
      3m 51s
    3. Finding and replacing using backreferences
      7m 16s
    4. Non-capturing group expressions
      3m 29s
  9. 32m 31s
    1. Positive lookahead assertions
      6m 39s
    2. Double-testing with lookahead assertions
      7m 16s
    3. Negative lookahead assertions
      6m 10s
    4. Lookbehind assertions
      6m 26s
    5. The power of positions
      6m 0s
  10. 13m 13s
    1. About Unicode
      4m 19s
    2. Unicode in regular expressions
      4m 41s
    3. Unicode wildcards and properties
      4m 13s
  11. 1h 55m
    1. How to use this chapter
      5m 38s
    2. Matching names
      6m 33s
    3. Matching postal codes
      8m 54s
    4. Matching email addresses
      5m 0s
    5. Matching URLs
      8m 1s
    6. Matching decimal numbers and currency
      6m 45s
    7. Matching IP addresses
      7m 10s
    8. Matching dates
      7m 49s
    9. Matching times
      8m 59s
    10. Matching HTML tags
      8m 34s
    11. Matching passwords
      6m 49s
    12. Matching credit card numbers
      9m 36s
    13. Finding words near other words
      6m 38s
    14. Formatting with Search and Replace, pt. 1
      7m 22s
    15. Formatting with Search and Replace, pt. 2
      4m 15s
    16. Formatting with Search and Replace, pt. 3
      7m 10s
  12. 47s
    1. Goodbye
      47s

Start learning today

Get unlimited access to all courses for just $25/month.

Become a member
Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ.

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Using Regular Expressions.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

Get started

Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferencesfrom the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Are you sure you want to delete this note?

No

Your file was successfully uploaded.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.