Join David D. Levine for an in-depth discussion in this video Introducing regular expressions, part of SED Essential Training.
- The old string in SED's s command is actually a regular expression. A regular expression is a special kind of string that defines a pattern. Which other strings are said to match or not match. Regular expressions are also used in other places in SED, which we'll discuss later. Regular expressions are covered in detail in the lynda course "Using regular expressions," but for now I'm just going to give you a quick introduction in case you haven't encountered them before. The most basic regular expression is a simple string, like /abc/.
The pattern abc matches any other string which contains the letters a, b, and c, in that order, which nothing in between them. Regular expressions in SED are surrounded by slashes, and are always case sensitive. So for example, /abc/ matches abc, but does not match abxc. It does not match ab, because there's no c. And it does not match uppercase ABC, because it's case sensitive. So, for example, In the command sed 's/the/THE/' dukeofyork.txt, the lowercase the is a regular expression that simply matches the letters t, h, and e in that order.
The lowercase letters, t, h and e, in that order. In addition to matching simple strings, you can use meta characters to create more sophisticated patterns. The period is the first meta character, it matches any single character. So for example, a.c matches abc. It also matches axc, or a any single character c, but it does not much ac. There has to be exactly one character between the a and the c. Likewise, it does not match axxc because there is more than one character between the a and the c.
So, for example, sed 's/he./ HEX/g' in dukeofyork.txt, looks for he followed by any single character and replaces it with uppercase HEX. So in the first line, you see THE space grand, is now replaced with T uppercase HEXgrand, and so on. This will look for HE followed by any single character, including spaces, letters, numbers, punctuation, and replace them with the uppercase word HEX.
The backslash removes the special meaning of any following meta character. So for example, a\.c removes that special meaning of dot, and now looks for a literal period. So a\.c matches a.c. The backslash can remove its own special meaning, such that a\\c matches a\c. And the backslash can also remove the special meaning of the slash, which would otherwise terminate the regular expression.
a\/c matches a/c. If you didn't use that last backslash, the slash would mark the end of the regular expression, assuming you used slashes as delimiters in the s command. The caret and the dollar sign match the beginning and end of the line, respectively. Such that caret abc matches the first three characters in the line abcd. But that same regular expression, caret abc, does not match anything in the line dabc, because the first characters of the line are not abc.
abc$ similarly does not match anything in the line abcd, because the dollar sign specifies the end of the line. So this will only match a line that ends with abc, which abcd does not. But, in the line dabc, that abc$ matches the last 3 characters, the abc at the end of that line. So, for example, if we look for down and replace it with FRED globally in our dukeofyork.txt file, it replaces four instances of down with FRED.
But if we look for down$ and replace it with FRED globally in our dukeofyork.txt file, then it only catches two, the two in which down appears at the end of the line. There are two other occurrences of down within the file, but they are not replaced, because they do not appear at the end of the line. Note that the caret and the dollar sign do not match the first and last characters on the line, but the actual beginning and end of the line itself.
- Understanding input, output, files, and pipes
- Modifying the "s" command
- Using character classes and quantifiers
- Controlling printing
- Reading and writing files
- Appending, inserting, and editing entire lines
- Writing programs in SED
- Using advanced programming commands