Skip to content

Instantly share code, notes, and snippets.

@whutch
Created May 1, 2013 01:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save whutch/5493252 to your computer and use it in GitHub Desktop.
Save whutch/5493252 to your computer and use it in GitHub Desktop.
Generator for parsing text into fields such as command line arguments or raw CSV.
def split_fields(text, delimiters = " \t\n\r", quotes = "\"'`"):
"""
Field parsing generator; default parameters will function similar
to command line argument parsing.
Ex. >>> [f for f in split_fields("this is\t`a te's't` ")]
['this', 'is', "a te's't"]
"""
index = -1
start = -1
end = len(text) - 1
in_quote = ""
while index < end:
index += 1
char = text[index]
if char in delimiters and not in_quote:
if start >= 0:
yield text[start:index]
start = -1
elif char in quotes:
if start < 0:
start = index
in_quote = char
elif char == in_quote:
yield text[start+1:index]
start = -1
in_quote = ""
elif start < 0:
start = index
if start >= 0:
yield text[start:]
@whutch
Copy link
Author

whutch commented May 1, 2013

This was mostly an educational endeavor as the same could be done much more simply with a regular expression:

[m[0] if not m[1] else m[0][1:-1] for m in \
  re.findall(r"(([\"'`]).*?\2|[^\s\"'`]+)", "this   is\t`a te's't` ")]

But it's much less easy to read and possibly slower (I didn't bother speed testing mine vs. a regex method).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment