Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member

Matching postal codes

From: Using Regular Expressions

Video: Matching postal codes

In this movie, we are going to talk about how to match postal codes, or as we call them in United States, ZIP codes. The first thing that's going to face you when you are trying to match postal codes is deciding what country you're trying to match it for, because every country sets their postal codes completely differently. Now, we could write one long regular expression and match all of those, but because there are so many countries out there, and so many different postal codes, probably the better way to do it would be to use some logic in our program that would determine first what the country was, and then based on what country it is, try to apply the proper regular expression to it.

Matching postal codes

In this movie, we are going to talk about how to match postal codes, or as we call them in United States, ZIP codes. The first thing that's going to face you when you are trying to match postal codes is deciding what country you're trying to match it for, because every country sets their postal codes completely differently. Now, we could write one long regular expression and match all of those, but because there are so many countries out there, and so many different postal codes, probably the better way to do it would be to use some logic in our program that would determine first what the country was, and then based on what country it is, try to apply the proper regular expression to it.

To give you a sample of how you can approach this problem, though, I am just going to look at three different ones. I am going to look at the United States, Canada, and the United Kingdom. In the United States, the postal code format is one of two things. It's either five digits, or it's five digits, and then a dash, and then four digits after. That dash four digits is often referred to as the plus four, but it's actually a dash in between them; not a plus sign. This is a pretty easy regular expression for us to write. We need to have either 75087, or 10010-6543.

Either of those are valid ZIP codes. What we want is to match any digit that is repeated five times. And that completely matches the first one, and partially matches the second one. But here I want you to see, if we also just had a series of numbers, it also returns a match in that case. We want to make sure that we are using anchors, so that we are matching the entire thing. We are saying from start to finish that it should be only five digits. And because I've got multi-lines here -- I want these to match at the end of the lines -- so you need to use multi-line mode.

If you were just looking at a single field of data, you wouldm't need that. So now we successfully match on 75087, but we no longer have any match on our plus four one; the one that has minus 6543 at the end. Add that as well, so dash, and then the digits, and then four digits after it. Now we've successfully matched that one, but not the first one, so the way to do that is to make this one optional. Make it into a group, make that group optional, and in addition, it's also a good idea to make that a non-capturing group, unless we are actually trying to capture that data.

Now you can see that it correctly matches the five digit ZIP code, and it matches the one that has the dash and four digits at the end as well. Keep in mind the anchors, making the other part optional, and making it a non-capturing group. Let's take a look at Canada. The Canadian postal code format is going to be A9A, space, 9A9. And in each of those spots where we have an A, it can actually be any of the characters A through Z. Where we have a 9, it can be any of the characters 0 through 9.

Well, we know how to do that with character sets, right? That's pretty easy. Let's try that out. So let's go ahead and let's take this out of here, and let's say A9A 9A9. That's our pattern. That means it could be something like B3Z, space, 2W3. Those both should be properly formatted Canadian ZIP codes. So let's take everything in between these anchors out of there, and let's start by having our first character set. It's all capital letters, A to Z, and then I am going to just copy that, because I'm going to need it a few times. And then let's do backslash, D for digit, paste in our A to Z, space, backslash, D for digit, paste in our A to Z, backslash, D for digit.

That's all it is. We just use character sets to match the pattern exactly. Now, this will make sure that the postal code is formatted properly for Canada. But like the United States one, neither of these says whether it actually is a real and existing postal code or not. It just makes sure that it's properly formatted, makes sure that it looks like a postal code, but it doesn't ensure that actually once we send it off to the post office, that the post office won't say, sorry, that postal code doesn't exist. We could try and write something that would limit those.

However, I think that there are so many of these combinations that are in use, and that they're constantly changing -- there's new ones being added all the time -- that it may not make sense to try and limit it too much. Unless you want to take on the constant maintenance of this regular expression to make sure that it stays up to date with current postal codes, then your regular expression could easily slip out of date. It's much better to just match the general format, make sure that it matches what seems like a correct postal code, and then let the post office decide whether it's actually deliverable or not.

As our final example, let's try the U.K. The United Kingdom postal code formats come in a number of flavors. We can have A9 9AA, where A and 9 represent the same things we just saw for Canada; any capital letter or any digit. Or they can be formatted A99 9AA, AA9 9AA, AA99 9AA, or A9A 9AA, and lastly AA9A 9AA.

So what we need to do is try and make some kind of sense out of these patterns, so that we can write a regular expression that will encompass them, because there is lots of different possibilities. If you start looking this, you will notice that all of them end in space 9AA. So that's great. We can just reuse that for each of those parts. So then all we have to do is figure out a way to describe what's in those first two parts. Well, if you'll notice, it always starts off with either one or two letters, followed by either one or two digits, and then just for the last two lines, there is the possibility that there is another A that comes at the end.

Let's take a shot at writing something that will encompass these. So to start with, let's paste in our sample data, and again, we know that we want to use multi-line anchors, and use our anchors around it. Now let's try writing A, dash, Z. And let's copy that; we are going to need it a few times. And then after that, we know that there can be either one or two of those: one or two of those. And then after that, there comes a digit, and there can be either one or two of those. And then there can be another A to Z, but this time it's optional, because only the last two lines use that one, followed by a space, and then backslash, D, and then a letter, which will occur two times. So there we go.

We wrote something that would match all of these expressions. However, in the process, if you check our logic very carefully, there is a problem. We allowed the possibility that this format works as well: AA99A 9AA. See? We got a match. But that is not a valid example. It's not possible to have two letters, followed by two digits, and have the optional A. If the optional A is there, the only ones that are valid have a single digit. So one way that we can fix this is by using alternation.

So let's take this expression here, and let's just copy it to start with, and let's put parentheses around it, and now let's use our alternator, and I'm just going to going to paste the same expression a second time. It's a duplicate at the moment. What I want to do is say, alright; the first case is when we don't have A to Z at all, so I will take that out. So that will match all those possibilities, except these last three lines here, if we include our one that shouldn't work; the one that should fail. And in the second case, I am going to take out the fact that it's optional. So now I am saying essentially the same thing that I had before.

Either it has the A to Z at the end, or it doesn't have the A to Z at the end. Well, in the case when the A to Z is at the end, no longer are we going to have one to two. Instead, we are only allowed to have one now. So now the edge case that shouldn't work no longer works. Just like with the U.S. and Canada, this allows for a lot of combinations of letters and numbers that don't actually exist as postal codes. What if we did want to restrict this to actual working postal codes? Well, we could check it against a postal code database, or we can narrow our regular expression.

It turns out that Wikipedia lists a regular expression that will do exactly that. And I am pretty sure that this regular expression, or a version of it, was created and put out by the U.K. postal service. So this is an example where we could try and narrow it down, because we actually have an authoritative regex that's been provided to us by the postal service. And as that updates, we can just update it with their new regular expression. As long as they continue to publish it, we can go ahead and use it. This is what it looks like for the U.K. It may get updated since I have posted this, but this will ensure that we actually only find postal codes that really do match.

I had to wrap this regular expression between two lines to get it to fit. There is no space between the end of the first line, and the beginning of the second line. Those two square brackets sit back to back. So I think it's pretty cool that a government agency understands the power and the usefulness of making a regular expression like this available to the public. I wish a lot more governments would do this.

Show transcript

This video is part of

Image for Using Regular Expressions
Using Regular Expressions

59 video lessons · 11667 viewers

Kevin Skoglund
Author

 
Expand all | Collapse all
  1. 2m 18s
    1. Welcome
      56s
    2. Using the exercise files
      1m 22s
  2. 19m 55s
    1. What are regular expressions?
      3m 20s
    2. The history of regular expressions
      6m 40s
    3. Regular expression engines
      2m 44s
    4. Installing an engine
      4m 5s
    5. Notation conventions and modes
      3m 6s
  3. 21m 23s
    1. Literal characters
      6m 39s
    2. Metacharacters
      2m 1s
    3. The wildcard metacharacter
      4m 31s
    4. Escaping metacharacters
      4m 53s
    5. Other special characters
      3m 19s
  4. 31m 26s
    1. Defining a character set
      5m 49s
    2. Character ranges
      4m 49s
    3. Negative character sets
      4m 53s
    4. Metacharacters inside character sets
      5m 12s
    5. Shorthand character sets
      6m 30s
    6. POSIX bracket expressions
      4m 13s
  5. 36m 38s
    1. Repetition metacharacters
      7m 17s
    2. Quantified repetition
      6m 59s
    3. Greedy expressions
      6m 27s
    4. Lazy expressions
      6m 46s
    5. Using repetition efficiently
      9m 9s
  6. 20m 24s
    1. Grouping metacharacters
      4m 14s
    2. Alternation metacharacter
      4m 54s
    3. Writing logical and efficient alternations
      7m 33s
    4. Repeating and nesting alternations
      3m 43s
  7. 19m 19s
    1. Start and end anchors
      7m 21s
    2. Line breaks and Multiline mode
      4m 41s
    3. Word boundaries
      7m 17s
  8. 23m 33s
    1. Backreferences
      8m 57s
    2. Backreferences to optional expressions
      3m 51s
    3. Finding and replacing using backreferences
      7m 16s
    4. Non-capturing group expressions
      3m 29s
  9. 32m 31s
    1. Positive lookahead assertions
      6m 39s
    2. Double-testing with lookahead assertions
      7m 16s
    3. Negative lookahead assertions
      6m 10s
    4. Lookbehind assertions
      6m 26s
    5. The power of positions
      6m 0s
  10. 13m 13s
    1. About Unicode
      4m 19s
    2. Unicode in regular expressions
      4m 41s
    3. Unicode wildcards and properties
      4m 13s
  11. 1h 55m
    1. How to use this chapter
      5m 38s
    2. Matching names
      6m 33s
    3. Matching postal codes
      8m 54s
    4. Matching email addresses
      5m 0s
    5. Matching URLs
      8m 1s
    6. Matching decimal numbers and currency
      6m 45s
    7. Matching IP addresses
      7m 10s
    8. Matching dates
      7m 49s
    9. Matching times
      8m 59s
    10. Matching HTML tags
      8m 34s
    11. Matching passwords
      6m 49s
    12. Matching credit card numbers
      9m 36s
    13. Finding words near other words
      6m 38s
    14. Formatting with Search and Replace, pt. 1
      7m 22s
    15. Formatting with Search and Replace, pt. 2
      4m 15s
    16. Formatting with Search and Replace, pt. 3
      7m 10s
  12. 47s
    1. Goodbye
      47s

Start learning today

Get unlimited access to all courses for just $25/month.

Become a member
Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ.

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Using Regular Expressions.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

Get started

Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferencesfrom the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Are you sure you want to delete this note?

No

Your file was successfully uploaded.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.