Groups in RegEx match

Most common RegEx usage is finding a text when it's surrounded by specific content. For example how to find a green fruit in this sentence: yellow banana, green avocado, red rambutan If we use parentheses to mark piece of the pattern green (\w+) we will get an additional match inside RegEx match: yellow banana, green avocado, red rambutan Try it yourself That is called a group and we can access it by number or name.

In the above example we have just one group - group no. 1. Keep in mind that groups numbering proceeds from left side of the pattern and counts opening parentheses. E.g. pattern (^(\w+) (\w+)) will have the group no. 2 and 3 included in the group no. 1, the biggest one.

To access the group value we can use the group() or groups() function. As you can see those names are quite misfortune. So similar. And the functionality is also very similar - misleading. Keep in mind that group(0) and group() (without arguments) will return the whole match, not a group written in parentheses... Actually the group is the main function returning the whole match! So bad name. Unfortunately the name match is already taken.

We can access the group value also during a matching process. By using a number preceded by the escape like here (["']).*\1, we can access a group and in this example find enclosed quotes: old "oranges", smelly 'dragon fruit' Try it yourself That is the only kind of RegEx memory we can use for a match. So don't expect too much, RegEx is not a programming language. Even it looks so complicated.

If a group matches more than one time in the string, e.g. (\w+berry\W*)+ here: strawberry, blueberry, roseberry Try it yourself only the last group value is accessible. To access every instance of it, we need to use the finditer() or findall() function.