Viewers: in countries Watching now:
Learn how to find and manipulate text quickly and easily using regular expressions. Author Kevin Skoglund covers the basic syntax of regular expressions, shows how to create flexible matching patterns, and demonstrates how the regular expression engine parses text to find matches. The course also covers referring back to previous matches with backreferences and creating complex matching patterns with lookaround assertions, and explores the most common applications of regular expressions.
In the last movie we talked about character sets and we saw how if you were writing a character set that was for let's say all capital letters, you would have to type out A, B, C, D, and so on, all the way till you got the Z. It's a lot of typing. Character ranges are going to help us solve that problem, by giving us a convenient shortcut by using the dash metacharacter to indicate a range of characters. So it's basically just a shorthand to keep you from having to do all of that typing. So it represents all characters that are between a starting character and an ending character. Now the dash is only a metacharacter when it's inside a character set; outside the character set it's just a literal dash.
Okay, so this really is just about character sets. So for example, instead of writing out all the numbers 0 to 9, we can just abbreviate that by saying 0-9. Now obviously the position is important because what we're saying is starting with 0m go up to 9. We could also have 5-9 or 3-7, but it tells us it's a range from the starting character till we get to the ending character. We can do the same thing with the alphabet. We can A-Z and then we can have immediately after it a-z lowercase, It's still telling us, look, all the characters A- Z capitalized, all the characters a-z lowercase are inside this character set. And of course, we could even break it up, a-e, k-o, and u-y.
So in the end that will have fifteen characters represented inside our character set, exactly as if we type them all out. Now one word of caution though is when you're working with numbers, 50-99 is not all numbers from 50 to 99. We're looking at text here, not numbers; this is just text. It doesn't have its same meaning. If we say 50-99, that's the same thing as saying 0 to 9, because what we're saying is this set includes the number 5, then it includes all numbers 0-9, and it also includes the number 9. That's our character set.
We've just repeated ourselves with the 5 and the 9 on either end, so be careful about that. Integers do not increment the way that you normally think that they would in math, right. This is not computer programming; we're just representing the actual characters that are there on the screen. This is one single character made up with this character set. Now it might be possible to do a range of characters with things like punctuation, but since the order of those really isn't that obvious, you probably would get unexpected results, and so you really shouldn't do it. Really what you're going to use it for are these that have an obvious progression--the numbers and the letters. Let's try a few.
So first in the example before, remember we typed out all of this? Well, instead we can just shorten all that down to just saying, well, everything from A-Z. That gives us our same match; it's the exact same thing, completely equivalent. Much easier. If we wanted to match all lowercase letters too--let's take away that ello there and just do a-z-- now see, it matches every letter, whether it's uppercase or lowercase. We've defined a character set that's case insensitive. As another example, let's say that we had a phone number here. Let's say we have 555-666-7890. Okay, so it's a phone number in America that has that format.
So if we wanted to write a regular expression that would match that--let's erase this--we want to have a character set, and inside that character set we want to match all digits 0-9. That's what's allowed to be in our character set. Now if we want to match the exact set, not just a single character, we would do, paste it in again and again--that matches the first three numbers--followed by a dash. You can see where it match those followed by three more numbers, followed by a dash, followed by four more numbers. Now we're going to see some even better ways to do this in the future, this really raw simplistic version now, before we start learning how to do things like repetition.
Let's say that we had a ZIP code, 90210, that's a very famous Beverly Hills zip code. Same thing, we just take out the dashes here and make it just five characters long and now it matches the zip code. In other countries, you are allowed to have things besides just numbers; in the US we only have numbers, but in other countries you might have something like WC2H 9AW in London. To match something like that, think for a second about how we'd match it. Let's back up. Let's simplify it. That matches the 2 and the 9, but it doesn't matches the letters.
Let's assume it's all uppercase, so A-Z will now match all those letters. Now we can just copy that, and let's assume for a moment that it has four characters, a space, and three characters, so we would have four of those, a space, and then three more of those, and now it matches. So you see it's much simpler than having to type that out. Imagine if I try to type 0-9 and A-Z all of those times for each one of those, so it does save you a lot of typing, and that's really the real purpose of ranges.
There are currently no FAQs about Using Regular Expressions.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.