Viewers: in countries Watching now:
Learn how to find and manipulate text quickly and easily using regular expressions. Author Kevin Skoglund covers the basic syntax of regular expressions, shows how to create flexible matching patterns, and demonstrates how the regular expression engine parses text to find matches. The course also covers referring back to previous matches with backreferences and creating complex matching patterns with lookaround assertions, and explores the most common applications of regular expressions.
In this movie, we'll learn to write a regular expression to match dates. And the formats for the dates we're going to use are going to be year, followed by month, followed by date. And obviously, if you wanted them in a different order, month-date-year for example, you would just know that you needed to swap them around, but we're going to stick to this format: year, then month, then date. So for example, 2000-11- 15; that would be a date. We have the year, 11 for the month, November, and then 15 for the date inside November. So, November 15th, 2011.
Of course, it can be a one digit number in front of each of those, not a two digit number, but we could have 2000-6-9, or 2000-06-09. We could have some leading zeros in front of those. We could also have different delimiters. So instead of the dashes, we could have these slashes; 2000/6/9. All of these are valid dates, and we're going to write a regular expression that will match all of them. So to start out with, like we've been doing in the other movies, let's click multi-line anchors, and let's put in our anchor tags here to make sure that we're just matching on a single line.
That's because we're using multiple lines of data here instead of just a single line of data. And then, to start with, let's just take a first stab at it. Let's just put in four digits, followed by a dash, followed by two digits, followed by a dash, followed by two more digits. So just like that, we've matched two out of our four entries, just with something very simple. So now let's start revising it. Well, we know that the digits here actually might not just be two digits; it might actually end up being one or two digits. So we know how to use the quantifiers for that. And we can do the same thing here, and say that that could match one or two digits.
Now we picked up this case in the middle: 6-9. While we're at it, let's go ahead and use our quantifier for these digits, and let's just say that the date is going to be four as well. We've matched the ones that have the dashes, but we haven't matched the one that had the forward slash. So what we need to do is turn this hyphen into a character set, and we can put hyphen and forward slash inside that character set. Now, it's up to you whether you escape that hyphen. Because there is nothing in front of it, I think you won't have any problems with the regular expression knowing that you mean the character, and that it's not a range operator.
And just like that, we've matched now all four of our cases. Now, for a lot of cases, this may be enough for you. You may just want to make sure that we have the digits, and that the digits are in the right place. But if you look carefully at this, you'll realize that we actually haven't made any allowance. Someone could type in the year 2000-14-55, and that matches too, and that's not a valid date. So let's see how we can go about eliminating this edge case, and make sure that it doesn't match as a date. Well, first let's start with just the month portion, and focus on that.
So right here, in place of these two digits here, let's be a little more specific. Instead, let's put in parentheses, and inside our parentheses, we'll try to specify the numbers 1 through 12, because there can be 1 for January, or 12 for December. So, you don't want to do 1 dash 12. That doesn't mean anything here; that's not a range. Instead, we have to break it out as a string. This is the thing we talked about in the last movie with the IP addresses; this is a number as string problem. We can't do a range of numbers; we have to deal with it as a string only.
The regex engine doesn't look at this as being what numbers come between a certain range. So how can we do that? Well, we know that we have the numbers 1 through 9. We can't have 0; 0 is not a valid date. It's got to be the numbers 1 through 9, or it's a two digit number, and in that case, it would be a 1. And then after the 1, we could have 0, 1, or 2. You could also change that to be a dash if that felt more comfortable to you. It's the same thing. Now, we've matched all of these, except we didn't match our 0 case anymore, where we have 06. That's because we need to also put a 0 in front of this, and put a question mark, because it's optional.
Now we've matched our first four cases, but not our last case, and we still haven't fixed the date problem, though. So actually, if we had this, it still matches, even though there's not 55 days in December. So let's fix the date portion now. That'll be this portion right here. Let's put in our parentheses again. Before we write it, let's think about the possibilities. We can have anything from 1 to 31. How would we break that down? So the way we would break down 31 is to say that we have that possibility that it's the numbers 30 to 31, in which case we would have a 3 at the front, followed by a 0 or a 1.
And then we would have 20 to 29 as the next possibility, so there is a 2 in front, and then a 0 to 9. Now, notice we can't combine those, right? Because then that would suddenly allow the possibility for there being 39, and that's not valid. So we have to keep those distinct and separate. Then we have 10 to 19; same thing. Very similar to what we had for 20 to 29. And then 01 to 09, with the 0 being optional. And the lastly, as we already mentioned, 0 itself is not allowed. Now, the numbers 10 to 29, we can actually write a shorter version of. We can combine those, because they both do allow us to go from 0 to 9.
We can smash those together, and say okay, either 1 or 2 as the first digit, and 0 to 9 as the second digit. And then once we've got our breakdown, we can take all of these cases, and put them together with alternation. So right here inside my parentheses, I'll just paste in all those possibilities. So now you can see that it does rule out 55. If I put in 31, now it works. So now we've written a regular expression that makes sure that the month is a valid month, and that the date is at least reasonable. It's at least a number between 1 and 31. Now, we haven't forbade the possibility that someone could ask for November 31st, or February 30th, and it certainly doesn't even know about leap years.
It is possible to write a more complex regular expression that will handle those, but I think typically those situations are better handled by a programming language which actually has knowledge about valid dates built into it. So the programming language would just take what looks like it's a proper date, and then test it, and say, hey programming language, is this a valid date? And it would say yes or no. We've already done some prechecking, but we'll let the programming language actually tell us whether it's valid or not. Now, as for the year, we've left the year wide open. Someone could type in the year 3000, and it would still come up as being a valid year.
Obviously, if you wanted to make that narrower, then you could do the same kind of thing there. You could say, all right, well it has to be a date that's in the 2000s. So you could start narrowing it down, and make sure that it meets your criteria. That really is going to depend on what your needs are. For example, if you want to make sure that it's a date between 1950 and 2050, then you're going to need the modify this, and let's go ahead and do that real quick, so we can see what that would look like. Let's put in parentheses. And we could say, alright, it's going to be either 19, from 5 to 9; that will take care of the years 1950 to 1999, with 0 to 9 as the second digit. Or it's a year in the 2000s, in which case it is either from 0 to 4, and from 0 to 4, we can handle all nine digits.
But then after that, the last possibility we still want to allow for is 2050. So, now this will allow dates from 1950-12-31 to 2050, but not 2051. That will be disallowed. So those kinds of changes will really depend on your specific needs. The main thing you need to understand is how you break down these dates and put in the alternations, so that we can deal with numbers as strings, and that there are some limits to even what this regex will check for that may be best left to a programming language.
Now, as an exercise on your own, try to match dates that are formatted as January, space, 31, comma, space, 2012. It could be any date; it might be in February, it might be November. You pick the date, but just try and do it with this, using a full text string, instead of the numerical format that we were working with before.
There are currently no FAQs about Using Regular Expressions.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.