Introduction to regular expressions
Viewers: in countries Watching now:
- Creating variables, functions, and loops
- Writing conditional code
- Sending messages to the console
- Working with different variable types and objects
- Creating and changing DOM objects
- Event handling
- Working with timers
- Building smarter forms
- Using regular expressions
Introduction to regular expressions
These both would create the same variable. It's the same way that we can create a new array object or use square brackets as a shortcut, we can make a new regular expression object or use the forward slashes as a shortcut. Now this is about as simple a pattern as you can get. It's just going to look for the word hello to exist somewhere in a string to be matched against. So I can then create a new string, in this case call it myString, and what I am going to do is call the test method of my regular expression against my string.
So if myRE.test, pass myString in, does that word hello occur? Yes, it does. We'll pop up an alert. Now this is case sensitive. Calling test will just return true or false. If you called search instead, you would actually return the position of the first match. Now, a complex pattern than just single words are created by using special characters, so as an example if I created a regular expression variable with the carrot symbol, this would denote the start of the word or the string that we are matching it again.
So hello would have to appear right at the start of the string. On the flip side, hello with a dollar sign at the end means hello would have to appear at the end of the string. And we can get even more specific. If I use a plus sign somewhere in the regular expression, that means the previous character, in this case L, has to appear once or more, which in this case would match for hello with one L, hello with two Ls or hello with a dozen or 500 Ls.
If I instead used an asterisk, that would be zero or more times, so the previous character L would have to be there zero or more, which means it would also match on h-e-o, so no L at all. And then we can also use a question mark, which means just zero or one, so h-e-o would match, h-e-l-o would match, but any more Ls than that would not match. If I use the pipe, it means either/or. In this case, it'd be true if the string contained either hello or goodbye.
If I use the point here, it means any character, which would much a whole bunch of different things. Now you will also see the backslash used a lot with regular expressions. So \w used as one little piece here means that this must be an alphanumeric character or an underscore. \b means a word boundary, like a space or a new line, which means here that hello would have to appear after a space or after a new line as a word by itself and not as part of another word.
And very often you'll also see the square brackets being used to denote a range of characters to match on. So in this case, I've got c-r-n-l-d inside square brackets. That means any of those letters followed by o-p-e will be regarded as true, but if I have a different letter, it will not. Now as you're probably beginning to tell, there's a lot of these things, and there is way more than I can show you here. What we start to do is describe more complex patterns by starting to string them together.
So as an example, here is one that would check for a valid format for a US ZIP code with an optional four-digit extension. We are surrounding it with the forward slashes at the start and end. Then we are using the carrot at the start to denote the start of the string and saying what characters allowed, how many characters there can be, and then the question marks to denote whether things are optional or not. Now, bear in mind you're likely to begin by finding examples of these online. If you're looking for a regular expression to match a credit card number format or a date or a password, there is really no reason you should be writing it yourself from scratch.
Now, oftentimes the best regular expression is a matter of some considerable debate. People have been arguing for many years on what the so-called ideal regular expression for an email address would be. Now, here is actually very simple one. What this is saying is that the first part of the line begins with a carrot and then after than we can have multiple letters, digits, periods, dashes and underscores. Then the @ must exist and then more of these matches where we can't use the special characters, then at least one, and then it must end with the top-level domain, which can be two to four letters like.com.net.org or a country code like UK.
But here's the thing, email regular expressions are notoriously difficult, and this one actually isn't correct. It won't allow long top-level domains like.museum, which might be rare, but certainly does exist. And it doesn't allow plus signs in the email, which can be used by some mail systems. In fact, the range of permissible email addresses is more complex than you might think, and if you include non-Western character sets, you should get into more complex regular expressions that are dozens, or even hundreds, of lines long. There really is no perfect regular expression for validating email.
There is just a variety of ones that are good enough, and this is worth bearing in mind with regular expressions. There is a lot of knowledge out there. Right now, what you need to understand is what they are and how they used. Regular expression syntax is not something you need to memorize just yet, if you've never come across them before. They are a tool, and there are something to be used when you need them.