Ready to watch this entire course?
Become a member and get unlimited access to the entire skills library of over 4,900 courses, including more Developer and personalized recommendations.
Start Your Free Trial Now Overview
 Transcript
 View Offline
Released
11/21/2011 Creating flexible patterns using character sets
 Achieving efficiency when using repetition
 Understanding different types of search strategies
 Writing logical and efficient alternations
 Capturing groups and reusing them with backreferences
 Developing complex patterns with lookaround assertions
 Working with Unicode and multibyte characters
 Matching email addresses, URLs, dates, HTML tags, and credit card numbers
 Using search and replace to format a document
Skill Level Intermediate
Duration
Views












In this movie, we'll learn how to write regular expressions to match times. When working with times, it's important to first decide what time formats we're going to be working with, even more so than when we were working with dates, because times can come in a lot of different formats. For example, we have to decide, are we working with 12 hour, or 24 hour formatted time? And if it's 12 hour, will a.m. and p.m. be in uppercase, lowercase, or both? Can there be a leading 0 in front of hours with single digits? Will we include seconds, or just work with the hour and minute? Will we include a time zone at the end? And if we have a time zone, is it going to be a three letter abbreviation, or expressed as an offset from Greenwich Mean Time? The answer to these questions and more will determine the regular expressions that we need.
Let's look at some sample times. So there is the simplest one, which is just 2:34. 2 o'clock, and 34 minutes, with a colon in between. We can put a P, M after that in lowercase, or in uppercase. We can put a leading 0 in front of the 2. We can choose to express it in 24 hour time, in which case the 2 o'clock becomes 14. We can include optional seconds at the end, we can include seconds with the hour and minute, and we can include the time zone, either by using a three letter abbreviation, or by expressing it as an offset from Greenwich Mean Time.
Now, because there are so many possibilities here, we're not going to try to write one regular expression that will match all of them. Instead, we'll work through some of the options, and then you can use those pieces to assemble a regular expression that matches the time patterns that you require. To start this out, let's turn on multiline anchors, and put in our start and ending anchor tags. That will ensure that our time regex only matches a string on a single line. Let's just do a quick and dirty one to get started. Let's just match that very first one, and we know we can do that with the backslash, D for a digit, followed by a colon, and then two backslash, Ds, so we've matched the first one.
Obviously, 2 o'clock is not the only hour possible. There is also 10, 11, and 12 o'clock, which require two digits. So let's add in the possibility that it's either one or two digits. Let's go ahead and change the way we express this to do the same thing; that it's two digits at the end. So now we've actually matched three of our examples. Unfortunately, it also matches times like 99:99, which is not a valid time. We need to be a little more specific about what numbers are actually allowed to be there. This is the same problem we saw when we were working with IP addresses, and with dates, where we have a number as a string problem.
We can't specify a range of numbers. Instead, we have to handle each possible character as a string only. So for example, for the minutes, instead of just saying that it can be any two digits, we need to say that the first digit can be any number from 0 to 5. Then, after that, the second character can be any number from 0 to 9. This will allow for minutes ranging from 00, all the way up to 59. Now, note that it's not possible just to say 0, dash, 59.
Inside the square brackets is a character set; not a range of numbers. So it's a range of characters from 0 to 5, and then the character 9, which is not the same thing as what we have here. Now, let's do the same thing for the hour. It's not just enough to say that the hour can be any one or two digits. Instead, we should specify that those digits can either be  and I'll put parentheses here, because it's going to be an alternation  it's either going to be numbers 1 through 9, or it's going to be a 1, followed by either 0, 1, or 2.
And, of course, you can use a range, if you prefer, inside that character set, instead of writing them out. Now, this matches our first example, but it doesn't match the example that has the leading 0, so we need to add an optional 0 as well. If it's 1 to 9, then there is a possibility that there is a 0, which is optional, which has that question mark after it. So now we've matched both of those, and we've matched the basics of 12 hour time. Note that it does not match 14:34 anymore. That's good, because we limited those numbers to only go up to 12. Let's stick with working with 12 hour time for now, and let's try adding the a.m. and the p.m. at the end.
So this is going to be an optional segment. It's optional whether it has the a.m. and p.m. So I'm going to put a optional group here, and inside that optional group, there is two ways that we can do this. The first is, you could just say, well there is an alternation in there: it's am, or it's pm, or it's AM, or it's PM. I find that to be pretty simple and readable. You can, instead though, use character sets, and say that the first character can be either lowercase a or uppercase A, second character can be lowercase p or uppercase P, and then the next character is either lowercase m or uppercase M. Either way works.
They both match the same pattern. It's really a matter of style and personal preference. Now, the second one does allow you to have capital A, followed by a lowercase m; the other one didn't allow that. And it also does allow you to have lowercase a, followed by a capital M, which might not be desirable. So again, that's really up to you, and what you're trying to match. But now we have a regular expression that matches all four examples of 12 hour time. Okay, so now you may want to copy and paste that regex somewhere to save for later, because what we're going to do now is tear it apart, and change it to work with 24 hour time instead.
The first thing we're going to do is start by removing the AM and the PM, including the question mark, because we're not going to be needing those with 24 hour time. The minutes can stay the same, but the hour is going to need to be modified. Instead of just allowing the numbers 10 to 12, now it's going to be one can go all the way up to 9 for 19, or we can have a 2, which is followed by 0 up to a 3. Not 4; not 24, because at 24, the time wraps around, and becomes 00 again.
Midnight is 00 in 24 hour time, which reminds us that we also need to allow the hours to change from being 1 to 9; they can now be 0 to 9. As a good test for this, let's try a few additional sample times. We've got 0:02, that should be valid. 23 :45, that should work, but 24:45 should not, because at 24, it wraps around. That's the same thing as saying 00:45. Now, we can also simplify the regex that we just wrote a bit.
Notice that we have a 0, which can be followed by 0 to 9, and we have a 1, which can also be followed by 0 to 9. We can combine those together. We could say that the first character can either be a 0 or 1 optional in front of it, followed by a 0 to 9. It matches the same thing exactly; it's just a little more concise. Okay. Now, we've worked through both 12 hour, and 24 hour time, so let's look at adding optional seconds. It's pretty simple. We just take the minutes, including the colon, we'll copy those, and then at the end here, and I'll put in an optional group, we'll put in our colon, followed by the seconds.
Make sure that the colon is inside the parentheses, because if we're not using the seconds, the colon won't be there either. The colon only exists if the seconds are there. Now, at the end of this, let's add an optional segment for the time zone. So right after that, let's put in another parentheses, and question mark, because it doesn't have to be included. Let's also put a space inside there, because we're going to have a space before we get to the time zone. Let's start by just doing the three letter abbreviation, like EST, and rather than trying to figure out what all those possibilities are, I'm just going to allow it to be any uppercase character A to Z, repeated three times, and that will match.
Now, you could be more specific than that, but there are so many time zones out there, it may not be worth to bother. Now, what if we wanted to allow it to match offsets from Greenwich Mean Time too? Well, we could just make another optional segment for the minus 5 that comes after those three characters. But, that would allow for a lot of three digits to come before that minus 5, when really, it's always going to be GMT. It's going to be those exact ones. So instead, let's make another alternation. So I'm going to put a parentheses here around these three characters, and then an alternation that says, or it's going to be GMT, followed by a space, and then for now, let's just put in minus 5, just so that it matches exactly.
So it matches the literal characters there. Now, we can replace the minus with the possibility that it's either a minus or a plus inside a character set, and then for the digit here, it can actually be one or two digits, and it can be any number up to 12. We know how to do that. Let's put in a character set here, and inside that character set, we're going to say that it is 0 to 9. Or  let's put our alternation here; there you go. Or, it's possible that there's a 1, followed by a 0, up to 2.
Now, be careful and keep good track of your parentheses here. There are lots of parentheses. The code coloring can really help you to make sure that you get it right. So now we are matching all examples that we have of 24 hour time, including the ones that have optional seconds, and the time zone. So we've covered most the options. Again, I'm not trying to give you one expression that matches all times, but instead to show you the way to think about these issues, so that you can construct an expression that suits your needs.

Public Link
Video: Matching times