In RegEx some more sophisticated functionality are called Extensions. But we cannot extend RegEx... All extensions starts from (? string up to closing ) of course. Those surrounding parentheses don't generate a group, except one extension, described below.
FLAGScan be any sequence of the i, L, m, s, u, x letters. Those letters switch on RegEx flags whose normally would be set by searching/compilation functions. It is convenient way of passing flags on. This example
(?i)pineappleignores letter cases:
Pineapple Try it yourselfBetter to put that extension in the beginning of the pattern, because some of the flags will not work if applied too late. Remember that RegEx doesn't have memory and reads strings (forward) not looking back.
- you have read about it in the section Grouping.
It groups RegEx expressions, like sole parentheses, but here without creating a
BODYcan be any valid RegEx.
- (?P<NAME>BODY), (?P=NAME)
- This extension generates a group. And gives it a name, which we can use instead of the group index. Actually the group number will still work. Use second variant to reference named group in a pattern.
- Simply, comment inside RegEx pattern. Another way to put comment is by using
- This is positive lookahead assertion. Very powerful feature. It
BODYagainst following text but it's a betweener at the same time. So it doesn't consume text for a match. E.g. pattern
\w+(?=.*eat)will tell us what fruit is eatable:
plums has been eaten Try it yourselfI haven't test it but it looks like lookahead assertions can be time consuming.
- This is negative lookahead assertion. It matches when following
text is not matching
- This is positive lookbehind assertion. It is also a betweener, but
it matches when preceded text is matching BODY. The example
(?<=\d)(?=\D)illustrates difference between lookahead and lookbehind betweeners:
33Ipeaches Try it yourselfAs you can see we can create new betweeners. Lookbehind assertions are definitely time consuming, because RegEx doesn't have memory, and it needs to look back. Another problem is that we have to use fixed length BODY in case of lookbehind betweeners. Repetition operators are not permitted.
- The last one is analogous negative lookbehind assertion.
- The most advanced extension is a kind of conditional statement. If
ID_OR_NAMEof a group matches not empty string,
YES_BODYis searched in this place, else
NO_BODY. Like in every conditional statement
papaya has lot of seeds Try it yourself