In the last post I said that most charecters in regular expressions
match themselves. The letter
a will match stirngs with a letter
in them and so on. The real power comes out when you can match groups
of characters. The most common of these is the dot
. (Period, full
stop etc) this matches any character that is not an end of line.
So the regex
ab.d will match the strings “abcd”, “abdd”, “abCd”
If you want to match a number, you can use
\d this will match, any
of the characters 0 - 9 So the regexp
a\d will match the strings
“a0”, “a4” etc.
If you want to match a word character you can use
\w this will match
any letter in the ascii (English) alphabet. So a threw z. So the
\w will match “A”, “a”, “q” etc. But it will not match things
like characters in Hebrew, Russian, Greek etc.
If you want to match a white space character you can use
will match space, tab as well as various unicode versions of those.
Finally you can invert any of those matchers by capitalizing it. So
“\W” will match a NON word character, or
\S a non space.
If you want to create your own class you can do so with square~
brackets. So the regex
[abc] will match the letters “a”, “b” or
“c”. If you want to not list the entire range you can use a dash, so
[a-z] will match any lowercase letter, and
[0-9a-f] will match 0-9
or a-f. IF you want to invert a class use a
^ so the pattern
[^aeiou] will match any letter that is not a vowel.g