A regular expression (RE) is a string of characters that can be used to match a set of character strings. For example, to globally search for all occurrences of the word "and" would require a search for "and", "And", "AnD", "AND", etc. Without regular expressions finding all possible occurrences of "and" would require eight separate searches. Using an RE the search could be done with one command.
Regular expressions are used by many Unix utilities, including:
ed
ex
vi
grep
sed
awk
(The awk utility interprets a special-purpose programming
language that makes it possible to handle
simple data-reformatting jobs easily with just a few lines of code.
Awk is not covered in this course, but the
GAWK Manual is a good guide
to its use.)
Regular expressions are used in searches and substitutions.
A character string is the simplest regular expression which simply matches the string itself. For example:
/hello/ - matches 'hello's/hello/goodbye/ - matches 'hello' and makes a substitution
The '.' character is used to match a single character. For example:
/p.t/ - matches 'p' and 't' separated by a single character, e.g. 'pit', 'put', 'pot', etc.
The expression /RE/ is used to match a set of characters in a single character position. For example:
/x[ab2X]y/ - matches any of the following: xay xby x2y xXy
In the expression /[RE]/ a range of characters can be specified. For example:
[a-z] - matches any single lower case character[0-9] - matches any single digit
Note however:
[0-57] - matches any one of the following:0 1 2 3 4 5 7
i.e. 0-5 and 7. Sets of characters can be combined:
[a-d5-8X-Z] - matches any one of the following:a b c d 5 6 7 8 X Y Z
It is possible to specify a set of characters which are not to be matched in the RE. For example:
[^0-9] - matches any single character which is not a digit
An anchor is used to match a RE found at a particular position. For example:
/^RE/ - matches RE at the start of a line /RE$/ - matches RE at the end of a line /^RE$/ - matches RE as the whole line
Note that there are two separate uses of the '^' operator. One is as the sart of line anchor, and the other as the 'logical not' operator. The latter function only applies inside square brackets.
Multiple occurrences of REs can be specified. For example:
a* - matches 0 or more occurrences of 'a'aa* - matches 1 or more occurrences of 'a'.* - matches any string of characters
A null RE stands for the last RE. For example:
:/[Tt]he.*car/p The blue car exploded with a roar. :s//(The blue car)/p (The blue car) exploded with a roar.
The '&' character in a replacement string stands for the most recently matched string. For example:
:/[Tt]he.*car/p The blue car exploded with a roar. :s//(&)/p (The blue car) exploded with a roar.
A sub-expression in a RE can be referred to.
\(string\) - defines an RE sub-expression\n - refers to the nth RE sub-expression
NOTE The backslash is the escape character for REs. This means it neutralises the special meanings of special characters. For example:
:p A line of text :s/\(line\).*\(text\)/\2\1/p A text line :*
It is possible to specify multiple occurrences of REs. For example:
c\{4\} matches exactly 4 c'sc\{4,\} matches 4 or more c'sc\{2,4\} matches between 2 and 4 c's
For example, to find a line containing 5 digits:
/[0-9]\{5\}/
start of line anchor (or NOT operator inside [] )
$ end of line anchor
. any character
* character repeated any number of times
\ escape character
[ ] contains range of characters
& string matched in search string
\ escape character
Note that any regular expression can be used with grep. (It gets its name from the editor command g/RE/p which means 'globally search for RE and print it'). This opens up many new possibilities for the use of grep. Unix commands that use regular expressions often makes the use of an editor redundant.
Obtain a listing of the members of your group from the password file using grep.
sed is a non-interactive stream editor which is used for text. The command to invoke sed is:
sed [-n] [-e command] [-f edfile] [input_file]
For example:
sed "s/UNIX/Unix/g" thesis > thesis.new
This will process the file thesis line by line, outputting each line to the file thesis.new and replacing each occurrence of the string "UNIX" with "Unix".
In the above example every line of thesis will be output to thesis.new, irrespective of whether it has been changed or not. This is because the default output for sed is every line of the input. Using the -n option supresses the default output, and only specified lines are output. In the above example this would mean that no lines would be output in the following example:
sed -n "s/UNIX/Unix/g" thesis > thesis.new
since a change but no output has been specified. If a print command is added, as follows:
sed -n "s/UNIX/Unix/gp" thesis > thesis.new
then only those lines in which "UNIX" had been changed to "Unix" would be output.
As you also see in the example, the -e option is not not necessary when there is only one editor command. It is possible to specify more than one command, and in this case each must be preceded by -e. For example:
% sed -e "s/a/A/" -e "s/b/B/" file1 > file2
This command will carry out the two substitutions on each line of file1.
The -f option enables the user to use a file containing editor commands, instead of typing out a series of commands with the -e option.
The sed command to list only files (exclude directories) is:
% ls -l | sed -n "/ -/p" -rw------- 1 lnp5jb 1765 mbox -rw------- 1 lnp5jb 320 example1
The sed command to extract a list of usernames from the password file is:
% sed "s/:.*//" /etc/passwd | more
What this does is to delete everything that comes after ':' in the password file.
1. Reproduce the effects of the above sed examples using grep instead. Note that grep is generally better for searches, such as this, while sed can be used to make changes to files.
2. Find the system's games directory and type quiz function ed-command to do the ed commands quiz. Don't worry if there are a couple of things that you haven't come across. Try it again and see if you improve your score.