Viewers: in countries Watching now:
Learn how to find and manipulate text quickly and easily using regular expressions. Author Kevin Skoglund covers the basic syntax of regular expressions, shows how to create flexible matching patterns, and demonstrates how the regular expression engine parses text to find matches. The course also covers referring back to previous matches with backreferences and creating complex matching patterns with lookaround assertions, and explores the most common applications of regular expressions.
In this movie, I want us to see how to write a regular expression that will match e-mail addresses. I want us to do this after we first talked about postal codes, because we are going to run into the same issues that we ran in to there. Now, we all know what a basic e-mail address looks like, right? It's the username for the person, followed by the at sign, followed by the domain where their account is located. And that will help route the e-mail to the correct domain, and then from there, it can be routed to the correct person. Let's try and write the basic regular expression for this, and then we will consider what exceptions and edge cases we need to take into account.
So to start with, let's just start with a simple e-mail address, firstname.lastname@example.org. Alright, so how do we write a regular expression for that? We know we need to use our anchor tags; we've learned by now. We'll put in the multi-line anchors, so that we can have more than one line here. And let's write that the first character needs to be a word character, and there can be one or more of those, followed by an actual at sign, followed by one or more word characters, and then a literal dot -- don't forget to escape it, because it's not a wildcard -- and then at the end, .com; we can match that by saying that it's any word character, and that there can be three of them.
Now, this matches our first example pretty well. But what if we have that same thing -- what if we had email@example.com. That's a valid and very common one as well. Countries just have two letters at the end. So obviously, we want to say that this can be two, comma, three, but we still didn't match it, because there is this period here. We have to allow for that as well. So instead of just saying that what can go inside this area here, instead of just saying that it's a word character, let's instead change it to say it's a word character, or it can be a period as well.
And that will make sure that somewhere.co gets validated. In reality, we can't use all word characters here. You can't have .00 at the end. It has to be a letter. So let's go ahead and specify that a little more, and say that it actually can be an uppercase letter or a lowercase letter, A to Z. That still matches both our cases. Here the Ws are correct. We can use digits, and we can use underscores. So this will match the basics. However, there are some edge cases that we do need to take into account.
Let's go ahead and make this into a character set. And in addition to just having letters, numbers, and underscore, it's also perfectly legal to have a dot, a percent sign, a plus sign, or a hyphen. We'd need to escape that one, just to make sure that it's clear that that is an escaped dash, and not a range character. Now, you may never have encountered an e-mail address that includes any of these. However, there is a standards body that sets these. So you can look out on the Internet, and find the standard for what an e-mail address is allowed to be, and then write your regular expression to match that standard.
Same thing in the last part here. The domain can actually have a period in it; it can always have a hyphen as well. So we have now written a pretty good one that matches most cases. It matches all of the country possibilities, it matches .edu, it matches .gov, it matches .mil, it matches .net; all these different possibilities that you're used to seeing as the trailing domain name will be allowed for. However, there are actually some domain names that are rare that you might not know about.
In addition to all the two letter country codes, and the three letter domain names that you're used to, there are also a number of other ones here in this list -- like museum and travel -- that are much longer. Some of them are four letters, but some of them are as much as six letters long, and these are constantly changing and being added to. Now, you may decide that this is good enough for your purposes, or you may decide that you want to allow four characters, or even six or seven characters here. Or maybe you want to make it unlimited, and say I don't care how many characters are at the end. That's really up to you as to what you are trying to do.
In all of these cases, it doesn't ensure that the domain actually exists. Now, I think this is a pretty good regular expression that will match most cases, and this is something like what I might use. I just want to show you that if you prefer, you can replace this with what's at the end, with something like this. And what this will do is actually check that it's either, it's a two letter country code, A to Z, or it's one of these other domains. So now we at least make sure that it is an actual domain. Typing in ggg for a domain name would not work.
Now personally, I think this can be a little bit tough to maintain, to go back to old programs, and old code that you've written before, and always be updating it with whatever the latest set of domain names are. So it's up to you. If you need this kind of precision, if you really need to make sure that it is a valid domain name, then you will want to plug something like this in. If not, you may just want to go with something simpler like this, that just ensures that it's properly formatted, even though they may not actually exist. It's the trade-off that we've seen a couple times now, where you're choosing between precision, and simplicity and maintainability.
There are currently no FAQs about Using Regular Expressions.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.