Learn to use a regular expression string to match any text according to the context.
- [Narrator] Hi and welcome to the fourth section of this course, texting and driving. The shell scripting language is packed with all the essential problem solving components for Unix slash Linux systems. Text processing is one of the key areas where shell scripting is used, and there are beautiful utilities, such as said, awk, grep, and cut, which can be combined to solve problems relating to text processing. Various utilities help to process a file in fine detail of a character, line, word, column, row and so on, allowing us to manipulate a text file in many ways.
Regular expressions are the core of pattern matching techniques and most of the text processing utilities come with support for it. By using suitable regular expression strings, we can produce the desired output, such as filtering, stripping, replacing, and searching. This section includes a collection of videos which walk through many contexts of problems based on text processing that will be helpful in writing real scripts. Now we move on to the first video of section four, using regular expressions. In this video we'll see how regular expressions are a form of tiny, highly specialized programming language used to match text.
Regular expressions are the heart of text processing techniques based on pattern matching. For fluency in writing text processing tools, one must have a basic understanding of regular expressions. Using wild card techniques, the scope of matching text with patterns is very limited. Regular expressions are a form of tiny highly specialized programming language used to match text. A typical regular expression for matching an email address might look like this. If this looks weird, don't worry. It is really simple once you understand the concepts through this video. Regular expressions are composed of text fragments and symbols which have special meanings.
Using these we can construct any suitable regular expression string to match any text according to the context. As reg ex is a generic language to match text, we're not introducing any tools in this video. Let's see a few examples of text matching. To match all words in a given text we can write the reg ex as follows. Question mark is the notation for zero or one occurrences of the previous expression, which in this case is the space character. This notation represents one or more alphabet characters.
To match an IP address we can write the reg ex as follows. We know that an IP address is in the form 192.168.0.2. It's in the form of four integers separated by dots. Hard bracket zero dash nine close hard bracket or hard bracket colon digit colon close hard bracket represents a match for digits from zero to nine.
Curly brace one comma three close curly brace matches one to three digits and slash dot matches the dot character. This reg ex will match an IP address in the text being processed. However, it doesn't check for the validity of the address. For example, an IP address in the from of 123.300.1.1 will be matched by the reg ex despite being an invalid IP. This is because when passing text strings usually the aim is to only detect IPs. Let's first go through the basic components of regular expressions.
This specifies the start of the line marker. For example, caret tuxs matches a line that starts with tuxs. This specifies the end of the line marker. For example tuxs dollar matches a line that ends with tuxs. This matches any one character. For example, hack dot matches hack one comma hack I but not hack one two or hack IL. Only one additional character matches.
This matches any of the characters enclosed in brackets. For example, this matches cook or cool. This matches any one of the characters, except those that are enclosed in brackets. For example, this matches 92 and 93 but not 91 and 90.
This matches any character within the range specified in brackets. For example, bracket one hyphen five closed bracket matches a digit from one to five. This means the preceding item must match one or zero times. For example, this matches color or colour but not colouur.
This means that the preceding item must match one or more items. For example, this matches roll no dash 99 and roll no dash nine but not roll no dash. This means that the preceding item must match zero more times. For example, this matches CL, col, and cool.
This treats the terms enclosed as one entity. For example, this matches max or matrix. This means that the preceding item must match N times. For example this matches any three digit number. Bracket zero dash nine closed bracket curly brace three close curly brace can be expanded as bracket zero dash nine closed bracket open bracket zero dash nine close bracket open bracket zero dash nine closed bracket.
This specifies the minimum number of times the preceding item should match. For example this matches any number that is two digits or longer. This specifies the minimum and maximum number of times the preceding item should match. For example-- This matches any number that is two to five digits.
This specifies the alternation one of the items on either side of pipeline should match. For example-- This matches awk first or awk second. This is the escape character for escaping any of the special characters mentioned previously.
For example-- This matches A.B but not AJB. It ignores the special meaning of dot because of slash. For more details on the regular expression components available you can refer to the following URL. Let's see how these special meanings of certain characters are specified in the regular expressions. Treatment of special characters. Regular expressions use some characters such as dollar, dot, asterisk, plus, open curly brace, and closed curly brace, as special characters.
But what if you want to use these characters as normal text characters? Let's see an example of a reg ex A dot text. This will match the character A followed by any character, due to the dot character which is then followed by the string text. However, we want dot to match a literal dot, instead of any character dot. In order to achieve this, we precede the character with a backward slash. Doing this is called escaping the character. This indicates that the reg ex wants to match the literal character rather than its special meaning, hence the final reg ex becomes A slash dot text.
Visualizing regular expressions. Regular expressions can be tough to understand at times, but for people who are good at understanding things with diagrams there are utilities available to help in visualizing reg ex. Here is one such tool you can use by browsing to www.regexper.com. It basically lets you enter a regular expression and creates a nice graph to help understand it. Here is a screenshot showing the regular expression we saw in the previous section. Great. We've successfully learned how to use regular expressions.
In the next video, we'll see how to search and mining a text inside a file with grep.
Note: This course was created by Packt Publishing. We are pleased to host this training in our library.
- Printing in the terminal
- Performing math in the Linux shell
- Getting and setting dates
- Working with functions and arguments
- Reading output
- Making comparisons
- Concatenating text
- Finding, editing, generating, and deleting files
- Running parallel processes
- Using regular expressions
- Downloading webpages
- Parsing data from a website
- Finding broken links
- Backing up and archiving
- Transferring files and data through the network
- Monitoring your Linux system
- Gathering data for system administration