Created
November 27, 2015 21:52
-
-
Save lucaswiman/5a79d06c309f12268e6e to your computer and use it in GitHub Desktop.
example unicode handling
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
>>> from parsimonious.grammar import Grammar | |
>>> | |
>>> QUOTE_UNQUOTE_GRAMMAR = Grammar(u''' | |
... text = (unquoted / quoted)+ | |
... | |
... unquoted = ~u"[^\u201c\u201d]+" | |
... quoted = ldquo text rdquo | |
... ldquo = ~u"\u201c" | |
... rdquo = ~u"\u201d" | |
... ''') | |
>>> | |
>>> | |
>>> QUOTE_EXAMPLES = [ | |
... u'Unquoted.', | |
... u'“What?”', | |
... u'“Well,” he explained, “that depends on what the meaning of the word “is” is.”', | |
... u'“🍣?”. “😊!”' | |
... ] | |
>>> | |
>>> for ex in QUOTE_EXAMPLES: | |
... print ex | |
... print QUOTE_UNQUOTE_GRAMMAR.parse(ex) | |
... | |
Unquoted. | |
<Node called "text" matching "Unquoted."> | |
<Node matching "Unquoted."> | |
<RegexNode called "unquoted" matching "Unquoted."> | |
“What?” | |
<Node called "text" matching "“What?”"> | |
<Node matching "“What?”"> | |
<Node called "quoted" matching "“What?”"> | |
<RegexNode called "ldquo" matching "“"> | |
<Node called "text" matching "What?"> | |
<Node matching "What?"> | |
<RegexNode called "unquoted" matching "What?"> | |
<RegexNode called "rdquo" matching "”"> | |
“Well,” he explained, “that depends on what the meaning of the word “is” is.” | |
<Node called "text" matching "“Well,” he explained, “that depends on what the meaning of the word “is” is.”"> | |
<Node matching "“Well,”"> | |
<Node called "quoted" matching "“Well,”"> | |
<RegexNode called "ldquo" matching "“"> | |
<Node called "text" matching "Well,"> | |
<Node matching "Well,"> | |
<RegexNode called "unquoted" matching "Well,"> | |
<RegexNode called "rdquo" matching "”"> | |
<Node matching " he explained, "> | |
<RegexNode called "unquoted" matching " he explained, "> | |
<Node matching "“that depends on what the meaning of the word “is” is.”"> | |
<Node called "quoted" matching "“that depends on what the meaning of the word “is” is.”"> | |
<RegexNode called "ldquo" matching "“"> | |
<Node called "text" matching "that depends on what the meaning of the word “is” is."> | |
<Node matching "that depends on what the meaning of the word "> | |
<RegexNode called "unquoted" matching "that depends on what the meaning of the word "> | |
<Node matching "“is”"> | |
<Node called "quoted" matching "“is”"> | |
<RegexNode called "ldquo" matching "“"> | |
<Node called "text" matching "is"> | |
<Node matching "is"> | |
<RegexNode called "unquoted" matching "is"> | |
<RegexNode called "rdquo" matching "”"> | |
<Node matching " is."> | |
<RegexNode called "unquoted" matching " is."> | |
<RegexNode called "rdquo" matching "”"> | |
“🍣?”. “😊!” | |
<Node called "text" matching "“🍣?”. “😊!”"> | |
<Node matching "“🍣?”"> | |
<Node called "quoted" matching "“🍣?”"> | |
<RegexNode called "ldquo" matching "“"> | |
<Node called "text" matching "🍣?"> | |
<Node matching "🍣?"> | |
<RegexNode called "unquoted" matching "🍣?"> | |
<RegexNode called "rdquo" matching "”"> | |
<Node matching ". "> | |
<RegexNode called "unquoted" matching ". "> | |
<Node matching "“😊!”"> | |
<Node called "quoted" matching "“😊!”"> | |
<RegexNode called "ldquo" matching "“"> | |
<Node called "text" matching "😊!"> | |
<Node matching "😊!"> | |
<RegexNode called "unquoted" matching "😊!"> | |
<RegexNode called "rdquo" matching "”"> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment