Skip to content

Instantly share code, notes, and snippets.

@cgranade
Created April 28, 2015 00:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cgranade/220d935692725ca0bfa0 to your computer and use it in GitHub Desktop.
Save cgranade/220d935692725ca0bfa0 to your computer and use it in GitHub Desktop.
(*Suppose that data is stored in a CSV-like format, with newline-delimted rows of comma-separated fields.*)
StringSplit[#, ","]& /@ StringSplit["a,b,c\nd,e,f", "\n"]
(*If certain fields are missing, then let's suppose this is represented by an empty string as the contents for that field.*)
StringSplit[#, ","]& /@ StringSplit["a,,c\nd,e,f", "\n"]
(*This is inconvienent, so let's write a function to parse that as the special values Missing[].*)
MarkMissingValues[dataStr_] := Map[
Function[row, StringSplit[row, ","]],
StringSplit[dataStr, "\n"]
] /. "" -> Missing[]
MarkMissingValues["a,b,c\nd,,f"]
(*This works great, right? Look what happens when a missing value is at the beginning or end:*)
MarkMissingValues[",b,c\nd,e,f"]
(*The transformation implied by StringSplit is, by default, not reversible.*)
StringRiffle[StringSplit["a,b,c", ","], ","]
StringRiffle[StringSplit[",b,c", ","], ","]
(*There is, however, a poorly-documented (or rather, accurately documented but hidden) argument to StringSpit that makes it preserve empty partitions at the beginning and end of a string:*)
StringRiffle[StringSplit[",b,c", ",", All], ","]
(*Using this option, we can now implement our parser correctly.*)
MarkMissingValues[dataStr_] := Map[
Function[row, StringSplit[row, ",", All]],
StringSplit[dataStr, "\n", All]
] /. "" -> Missing[]
MarkMissingValues[",b,c\nd,e,f"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment