Classes of characters
Characters classes (  ) seems to be an easy concept, borrowed from
a unix shell. The example
is self explanatory
Try it yourself
To avoid writing common classes we can use following shortcuts:
- \d which expands to [0-9]
- \D to [^0-9]
- \s to [ \t\n\r\f\v]
- \S to [^ \t\n\r\f\v]
- \w to [a-zA-Z0-9_]
- \W to [^a-zA-Z0-9_]
(Note that \b matches between \w and \W)
What is not self explanatory about the classes are rules which characters have different meaning inside them. Here are the rules:
The special RegEx characters are not special in a class. E.g.
[(a*)+]*has just one repetition character, not three. And it matches:
a*(b+c)a Try it yourself
^ (caret) is not special unless it is first in a class. The pattern
[a2^]+will match here everything:
a^2 Try it yourselfbut
a^2 Try it yourselfThat example is bit tricky because it matches ^ anyway.
- (hyphen) is not special when it is at the front or end of a class, or
when it is escaped.
a--b Try it yourselfThat match can be done by
[-ab]+but not by
[a-b]+unless it is escaped like this
[a\-b]+. Moreover, hyphen character is not special outside the classes.
[ (open bracket) is not special inside a class. E.g.
i Try it yourselfThat also means that a class cannot include other class. But classes shortcuts are some way around that constraint. The pattern
[\w ]+will match:
juicy fruit Try it yourself