Viewers: in countries Watching now:
Learn how to find and manipulate text quickly and easily using regular expressions. Author Kevin Skoglund covers the basic syntax of regular expressions, shows how to create flexible matching patterns, and demonstrates how the regular expression engine parses text to find matches. The course also covers referring back to previous matches with backreferences and creating complex matching patterns with lookaround assertions, and explores the most common applications of regular expressions.
In this chapter, we will explore many useful regular expressions. These are some of the most common regular expressions that you will counter, and I'll walk you through each one, and show you how it's created, so that you can see the logic and the choices that go into them, as well as the pitfalls to watch out for. But these regular expressions come with a few important usage instructions. The most important of these is that these are not one size fits all regular expressions. Instead, they're meant to be examples of how to approach the problem, with a solution that might work for some people, some of the time, but using a regular expression requires that you tailor it to fit your own needs.
It's very probable that my regular expression will not match many common cases, and even fewer special cases, that you might need. That's because a regular expression can be written broadly, or narrowly. If you write it broadly, then it will match a lot of things; maybe too many things. Maybe you need it to be restrictive, but if I write it too broadly, then some of those restrictions won't get ruled out. Or, you might write it narrowly, in which case it would be very restrictive, and in the process it becomes brittle, and breaks if there are lots of exceptions to the normal rules.
It's really going to be up to you, when writing a regular expression, to decide where you want to tailor your regular expression; whether you want it to be broad, whether you wanted it to be narrow, or somewhere in between. For example, let's say we're trying to create a regular expression that's going to match a year. Well, the simplest form of that would just be to write backslash, D, four, and that would match a year. That would match 2005; that's the broadest possible way that we can match a year. But that also matches the year 0000, and the year 9999.
Now again, for your purposes you may not care. You may just really want to make sure that it's four digits. Or, you might want to limit the year, and ensure that it's start with the 19 or 20, in which case you can modify your regular expression like this. My revised expression would match the years 1900 to 2099, but maybe that's still to broad. Maybe for your purposes you really need to tailor it to some specific years, so then you would narrow it further, so that then it matches only years from 1950 to 2049. The first one is very broad, and might match more things than we intend. The last one is very narrow, and might be too restrictive on the dates that we've picked.
For example, what if our regular expression stays in use for more than 50 years, or what if we're doing a database where we expect everyone's birth date would be from 1950 forward, but then someone comes along who actually has a birth date that's before 1950, and that's a surprise to us. They are an exception to the rule, because then our regular expression won't let them enter data into our system. You get the idea, and this is for something that's relatively simple. Once our regular expression start getting more complex, these kinds of issues pop up all the time. And it's not just with my regular expressions; it's with any regular expression that you find out there. You never want to use someone else's regular expression without checking it carefully, and fine-tuning it for your specific purpose.
If you've got a regular expression for a phone number, does it allow for a zero or one prefix at the beginning, or an extension at the end? Does a regex for a name allow for initials, or for Mr, or Mrs, or Dr? Or for junior or senior? It's important that you make sure that the regex is doing the job that you expect it to. Only you know what is good enough for your purposes. Here are some tips on how to write or customize a regular expression for your needs. My first tip is that you want to examine the data that's going to be matched. You need to have a good understanding of what's there, and what you're trying to match against, before you even start to write your regular expression.
Determine what aspects of the data are important. What do you care about? Does it matter that the dates fall between a specific range, or does it only matter that they are four digits long? And from that, you can determine what level of precision is going to be required in your regular expression. And then make a list of edge cases to test. By edge cases, what we mean are those 1% and 2% cases, where it's rare that they're going to come up, but we still need to make sure that we handle them. It's often helpful to think about the longest or shortest version of something that's possible, or the highest or lowest numbers, the most unusual, and the most oddly formatted.
Think about what those are, and decide whether your regular expression should match them, or should not match them. Make yourself a list of them, and then run your regular expression against them to test. The second thing you need to know about using this chapter is you want to make sure that when you're using regular expressions that you use anchors, and delimiters or context, to make sure that you're finding the regular expression that you expect. For example, let's say that we were trying to match a word. So we write ourselves a regular expression that says, I'm looking for one or more word characters. Well, you might be surprised to find that that will generate a match off of this garbage string that I've got there, because there is an X in it.
We didn't say that it had to be a word character in any certain context; we just said that it had to find one, so it finds one, and it says, yep, I've got a match. What we might have really meant to do was say, I want to find, from the beginning and the end to the string, only word characters. That's very different than just saying, hey, can you find any word characters? Those anchors become important. We could do the same thing with word boundaries, with spaces, or with commas, and depending on the context, those kinds of delimiters can help us to find what we're looking for in the right context.
You might even be able to just use a larger part of the context around which you're looking for. For example, if we're looking for the last word in a list, it might be and space; a word, followed by a period. That context helps make sure that we get exactly what we're looking for. And last of all, you should always be mindful of greediness and laziness. You want to be careful that you don't craft a regular expression that's grabbing more than you intend. We'll see how these factors come into play as we start things out by looking at how to match a name.
There are currently no FAQs about Using Regular Expressions.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.