Classes of characters

Characters classes ( [] ) seems to be an easy concept, borrowed from a unix shell. The example [aeiou] is self explanatory Mangosteen Try it yourself To avoid writing common classes we can use following shortcuts:

  • \d which expands to [0-9]
  • \D to [^0-9]
  • \s to [ \t\n\r\f\v]
  • \S to [^ \t\n\r\f\v]
  • \w to [a-zA-Z0-9_]
  • \W to [^a-zA-Z0-9_]

(Note that \b matches between \w and \W)

What is not self explanatory about the classes are rules which characters have different meaning inside them. Here are the rules:

  1. The special RegEx characters are not special in a class. E.g. [(a*)+]* has just one repetition character, not three. And it matches: a*(b+c)a Try it yourself
  2. ^ (caret) is not special unless it is first in a class. The pattern [a2^]+ will match here everything: a^2 Try it yourself but [^a2]+ just a^2 Try it yourself That example is bit tricky because it matches ^ anyway.
  3. - (hyphen) is not special when it is at the front or end of a class, or when it is escaped. a--b Try it yourself That match can be done by [ab-]+ or [-ab]+ but not by [a-b]+ unless it is escaped like this [a\-b]+. Moreover, hyphen character is not special outside the classes.
  4. [ (open bracket) is not special inside a class. E.g. [[i]+ matches: i[0] Try it yourself That also means that a class cannot include other class. But classes shortcuts are some way around that constraint. The pattern [\w ]+ will match: juicy fruit Try it yourself