Skip to content

Instantly share code, notes, and snippets.

@felipeochoa
Last active August 29, 2015 14:23
Show Gist options
  • Save felipeochoa/d629070df3f4ef376a79 to your computer and use it in GitHub Desktop.
Save felipeochoa/d629070df3f4ef376a79 to your computer and use it in GitHub Desktop.
A regular expression to tokenize excel formulae
import re
TOKEN_RE = re.compile(r"""
(?P<scientific>[1-9](?:\.[0-9]+)?[Ee][+-]?[1-9][0-9]*)|
(?P<string>"(?:[^"]*"")*[^"]*"(?!"))|
(?P<link>'(?:[^']*'')*[^']*'(?!'))|
(?P<sq_bracket>\[[^]]*\])|
(?P<error>\#NULL!|\#DIV/0!|\#VALUE!|\#REF!|\#NAME?|\#NUM!|\#N/A)|
(?P<wspace>\ +)|
(?P<operator>[-+*/^&=%]|>=|<=|<>|>(?!=)|<(?![=>]))|
(?P<func_opener>[^-"'[# +*/^&=><%{()};,]+\()|
(?P<lit_opener>[{(])|
(?P<closer>[\])])|
(?P<separator>[,;])|
(?P<operand>[^-"'[# +*/^&=><%{()};,]+(?!:\())
""", re.VERBOSE)
def as_token(match):
"Convert a Match object into a token tuple."
for key, val in match.groupdict().iteritems():
if val is not None:
return (val, key.upper())
def tokenize(formula):
"Tokenize a formula."
for match in TOKEN_RE.finditer(formula):
yield as_token(match)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment