Skip to content

Instantly share code, notes, and snippets.

@moreati
Created October 10, 2021 10:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save moreati/9d974e5395829d737dc342715f15fc56 to your computer and use it in GitHub Desktop.
Save moreati/9d974e5395829d737dc342715f15fc56 to your computer and use it in GitHub Desktop.
Demonstration of regular expression (?(DEFINE) ...) for declaring resuable sub-patterns
#!/usr/bin/env python3
import regex
PATTERN = regex.compile(r'''
# Declare reusable sub-patterns
(?(DEFINE)
(?<COUNT>[0-9]+)
(?<CURRENCY>EUR|GBP|USD)
(?<QUANTITY>[0-9]+[.][0-9]+)
(?<SEP>[\t ])
)
# The pattern to match
(?<name>[A-Za-z]+)
(?&SEP)+
(?<currency>(?&CURRENCY))
(?&SEP)
(?<price>(?&QUANTITY))
(?&SEP)
(?<discount_pct>(?&QUANTITY))%
(?&SEP)
(?<quantity>(?&COUNT))
''',
regex.VERBOSE,
)
m = PATTERN.match('Pencils\t\tEUR 0.10 5.00% 1000')
print({k: v for k, v in m.groupdict().items() if k == k.lower()})
@moreati
Copy link
Author

moreati commented Oct 10, 2021

$ python3 regex_define.py
{'name': 'Pencils', 'currency': 'EUR', 'price': '0.10', 'discount_pct': '5.00', 'quantity': '100'}

@moreati
Copy link
Author

moreati commented Oct 10, 2021

  • sub-pattern names don't have to be uppercase, but I find it a handy convention
  • sub-pattern names are included in m.groupdict(), hence the extra filtering
  • (?(DEFINE) is not supported by Python's stdlib re module. This uses https://pypi.org/project/regex/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment