Humdrum lab 7

From CCARH Wiki
Jump to navigation Jump to search

Regular Expressions

Basic Regular Expressions

"Basic" regular expressions are the initial implementation of grep, that came with unix in 1973. Here are the "metacharacters" in the basic implementation of regular expressions:

Basic-regular-expressions.png

Dot metacharacter

The dot, or period, character is used to indicate any single character. In the following example, the regular expressin "c.t" will match to any three characters which start with "c", end with "t", and have any single character between these two characters.

Basic-regular-expression-dot.png


Star metacharacter

The star, or asterisk, character is used to indicate the the previous character (or parentheses group) will be matched if it occurs 0 or more times in the search string.

In the following example, the regular expression "c*t" will match to strings that contain zero or more "c" characters followed by the letter "t":

Basic-regular-expression-star.png

Note that "*" must be preceded by a character. If the "*" comes at the start of a line, that is an error because there is nothing to the left of the star for it to operate on.


Square-bracket metacharacters

Square brackets enclose a list of allowed characters in a matched string. Only one of the characters will be matched in a search string.

Basic-regular-expression-square-brackets.png


There is more syntax related to square brackets. You can negate the list by adding "^" as the first charcter, such as match to all characters that are not vowels: "[^aeiou]".

Basic-regular-expression-square-brackets-negate.png


Another syntax is a character range, such as "[0-9]" which is equivalent to "[0123456789]", or "[A-Ga-g]" which is equivalent to "[ABCDEFGabcdefg]".

Basic-regular-expression-square-brackets-range.png


Carat metacharacter

The carat metacharacter (^) is a line *anchor*. This character indicates that the matched characters (that follow) must occur at the start of the line. Notice that this character does double duty, as it is also the negation metacharacter when at the start of a list in square brackets!

In the following example "^cat" matches to the first occurrence of "cat" on the line:

Basic-regular-expression-square-brackets-carat.png


Dollar metacharacter

The dollar metacharacter ($) is another line *anchor*. This character indicates that the matched characters (that precede) must occur at the end of the line.

In the following example "cat$" matches to the second occurrence of "cat" on the line:

Basic-regular-expression-square-brackets-dollar.png


Backslash metacharacter

The bacslash metacharacter is used to un-metafy a metacharacter, turning it into a normal character. For example "c*" means zero or more letters c's, while "c\*" means the letter a followed by an asterisk in a matched string:


Basic-regular-expression-backslash.png