Created
October 10, 2021 10:43
-
-
Save moreati/9d974e5395829d737dc342715f15fc56 to your computer and use it in GitHub Desktop.
Demonstration of regular expression (?(DEFINE) ...) for declaring resuable sub-patterns
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
import regex | |
PATTERN = regex.compile(r''' | |
# Declare reusable sub-patterns | |
(?(DEFINE) | |
(?<COUNT>[0-9]+) | |
(?<CURRENCY>EUR|GBP|USD) | |
(?<QUANTITY>[0-9]+[.][0-9]+) | |
(?<SEP>[\t ]) | |
) | |
# The pattern to match | |
(?<name>[A-Za-z]+) | |
(?&SEP)+ | |
(?<currency>(?&CURRENCY)) | |
(?&SEP) | |
(?<price>(?&QUANTITY)) | |
(?&SEP) | |
(?<discount_pct>(?&QUANTITY))% | |
(?&SEP) | |
(?<quantity>(?&COUNT)) | |
''', | |
regex.VERBOSE, | |
) | |
m = PATTERN.match('Pencils\t\tEUR 0.10 5.00% 1000') | |
print({k: v for k, v in m.groupdict().items() if k == k.lower()}) |
Author
moreati
commented
Oct 10, 2021
- sub-pattern names don't have to be uppercase, but I find it a handy convention
- sub-pattern names are included in
m.groupdict()
, hence the extra filtering (?(DEFINE)
is not supported by Python's stdlib re module. This uses https://pypi.org/project/regex/
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment