Created
May 1, 2013 01:45
-
-
Save whutch/5493252 to your computer and use it in GitHub Desktop.
Generator for parsing text into fields such as command line arguments or raw CSV.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def split_fields(text, delimiters = " \t\n\r", quotes = "\"'`"): | |
""" | |
Field parsing generator; default parameters will function similar | |
to command line argument parsing. | |
Ex. >>> [f for f in split_fields("this is\t`a te's't` ")] | |
['this', 'is', "a te's't"] | |
""" | |
index = -1 | |
start = -1 | |
end = len(text) - 1 | |
in_quote = "" | |
while index < end: | |
index += 1 | |
char = text[index] | |
if char in delimiters and not in_quote: | |
if start >= 0: | |
yield text[start:index] | |
start = -1 | |
elif char in quotes: | |
if start < 0: | |
start = index | |
in_quote = char | |
elif char == in_quote: | |
yield text[start+1:index] | |
start = -1 | |
in_quote = "" | |
elif start < 0: | |
start = index | |
if start >= 0: | |
yield text[start:] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This was mostly an educational endeavor as the same could be done much more simply with a regular expression:
But it's much less easy to read and possibly slower (I didn't bother speed testing mine vs. a regex method).