Python objects

Working with RegEx we have to deal with 3 data types:

Strings with RegEx patterns
Those are normal Python's strings, except that we should use the raw formatting (check out the section Formatting)
Compiled RegEx patterns ( re.RegexObject)
If we have just few patterns we don't have to care about a compilation. They will be created on the fly if needed and not exist.
Match objects ( re.MatchObject)
Those are results of patterns match. From them we can get the matched string, groups and their's positions.

Compiled patterns when reuse should save execution time. If we don't create them, the string with a pattern will be compiled on the fly anyway. I've tested RegEx performance for some time and it looks like documentation says it most accurate: if we have just few patterns, and we use them occasionally, we don't need to compile. RegEx has a cache of recent patterns. Of course not using the cache is good practice if we match millions of times.

Match objects always return True if we put them in if statement. On the other hand, if we don't have a match, we also don't have the match object just none (which is False in condition statement). That way we can always check if the match was successful. Another useful thing from match objects are the start() and end() methods. They return positions where a match begins and ends. They also work for the specified group.

The search() method or module function returns only the first (leftmost) match if a pattern matches more than once. The same is true for the match(). BTW, I've never found match() function useful. Adding ^ at the beginning of a pattern gives the same functionality.

The findall() and finditer() return only non-overlapping matches. What is surprising is that findall returns only groups if they exist, not the whole match. E.g. the pattern .(o) used in finditer() will match: Pomelo Try it yourself but in findall(): Pomelo Try it yourself If we don't have groups it returns the whole match. One more thing, the findall() returns only a matched string, not a match object.

The sub() has just one RegEx pattern parameter. Another one - substitution parameter is a normal string except that \1 \2 etc. will expand to groups values. Named groups referenced as \g<name> also work. Instead of a substitution string we can use a function and construct that value dynamically.

The split() will not work if it will have only betweeners. It has to match at least one character. So to split lines use the \n rather than $. If we use parentheses in a split pattern, those groups will be returned along split text. When split separator matches at the end, an empty string is also returned at the end. Like in the findall(), matching is done in non-overlapping fashion from left to right.