Skip to content

Instantly share code, notes, and snippets.

@lucaswiman
Created November 27, 2015 21:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lucaswiman/5a79d06c309f12268e6e to your computer and use it in GitHub Desktop.
Save lucaswiman/5a79d06c309f12268e6e to your computer and use it in GitHub Desktop.
example unicode handling
>>> from parsimonious.grammar import Grammar
>>>
>>> QUOTE_UNQUOTE_GRAMMAR = Grammar(u'''
... text = (unquoted / quoted)+
...
... unquoted = ~u"[^\u201c\u201d]+"
... quoted = ldquo text rdquo
... ldquo = ~u"\u201c"
... rdquo = ~u"\u201d"
... ''')
>>>
>>>
>>> QUOTE_EXAMPLES = [
... u'Unquoted.',
... u'“What?”',
... u'“Well,” he explained, “that depends on what the meaning of the word “is” is.”',
... u'“🍣?”. “😊!”'
... ]
>>>
>>> for ex in QUOTE_EXAMPLES:
... print ex
... print QUOTE_UNQUOTE_GRAMMAR.parse(ex)
...
Unquoted.
<Node called "text" matching "Unquoted.">
<Node matching "Unquoted.">
<RegexNode called "unquoted" matching "Unquoted.">
“What?”
<Node called "text" matching "“What?”">
<Node matching "“What?”">
<Node called "quoted" matching "“What?”">
<RegexNode called "ldquo" matching "“">
<Node called "text" matching "What?">
<Node matching "What?">
<RegexNode called "unquoted" matching "What?">
<RegexNode called "rdquo" matching "”">
“Well,” he explained, “that depends on what the meaning of the word “is” is.”
<Node called "text" matching "“Well,” he explained, “that depends on what the meaning of the word “is” is.”">
<Node matching "“Well,”">
<Node called "quoted" matching "“Well,”">
<RegexNode called "ldquo" matching "“">
<Node called "text" matching "Well,">
<Node matching "Well,">
<RegexNode called "unquoted" matching "Well,">
<RegexNode called "rdquo" matching "”">
<Node matching " he explained, ">
<RegexNode called "unquoted" matching " he explained, ">
<Node matching "“that depends on what the meaning of the word “is” is.”">
<Node called "quoted" matching "“that depends on what the meaning of the word “is” is.”">
<RegexNode called "ldquo" matching "“">
<Node called "text" matching "that depends on what the meaning of the word “is” is.">
<Node matching "that depends on what the meaning of the word ">
<RegexNode called "unquoted" matching "that depends on what the meaning of the word ">
<Node matching "“is”">
<Node called "quoted" matching "“is”">
<RegexNode called "ldquo" matching "“">
<Node called "text" matching "is">
<Node matching "is">
<RegexNode called "unquoted" matching "is">
<RegexNode called "rdquo" matching "”">
<Node matching " is.">
<RegexNode called "unquoted" matching " is.">
<RegexNode called "rdquo" matching "”">
“🍣?”. “😊!”
<Node called "text" matching "“🍣?”. “😊!”">
<Node matching "“🍣?”">
<Node called "quoted" matching "“🍣?”">
<RegexNode called "ldquo" matching "“">
<Node called "text" matching "🍣?">
<Node matching "🍣?">
<RegexNode called "unquoted" matching "🍣?">
<RegexNode called "rdquo" matching "”">
<Node matching ". ">
<RegexNode called "unquoted" matching ". ">
<Node matching "“😊!”">
<Node called "quoted" matching "“😊!”">
<RegexNode called "ldquo" matching "“">
<Node called "text" matching "😊!">
<Node matching "😊!">
<RegexNode called "unquoted" matching "😊!">
<RegexNode called "rdquo" matching "”">
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment