Sometimes, you’d like to look at delimited files on the command line:
cat test.csv
date,name,status tomorrow,Porthy,melancholy yesterday,,morose someday,, Mon,Echawr,ecstatic ,Llano,blue ,Llany,suspicious ,,sleeping ,Peny, ,Llandy,piqued yesterday,Llaisy,oppressed
This is often annoying. Because the columns don’t line up, it’s hard to see what values go with what column. Fortunately, there’s a fantastic command line utility that will make aligned tables for you:
cat test.csv | column -s , -tx
date name status tomorrow Porthy melancholy yesterday morose someday Mon Echawr ecstatic Llano blue Llany suspicious sleeping Peny Llandy piqued yesterday Llaisy oppressed
Oh. Except column
doesn’t understand how to handle empty fields. So this is actually worse than before, because now we have a result that looks like it’s correctly aligned, but actually is not. column
silently fails. (Although I have since discovered that this behavior of column
was a bug, and it has been fixed in at least some distributions.)
I suppose we could fix this by just using sed
to add some sort of ‘this field was empty’ value to the empty fields, and then columnate:
cat test.csv | sed "s/,,/,NULL,/g" | column -s , -tx
date name status tomorrow Porthy melancholy yesterday NULL morose someday NULL Mon Echawr ecstatic Llano blue Llany suspicious NULL sleeping Peny Llandy piqued yesterday Llaisy oppressed
Oh. That almost works, but we forgot to consider all the possible cases. We need to worry about leading empty fields (beginning of line followed by delimiter), consecutive internal empty fields (delimiter delimiter delimiter), and trailing empty fields (delimiter followed by newline). Like this:
cat test.csv | sed "s/,,/,NULL,/g" | sed "s/,,/,NULL,/g" | sed "s/^,/NULL,/g" | sed "s/,$/,NULL/g" | column -s , -tx
date name status tomorrow Porthy melancholy yesterday NULL morose someday NULL NULL Mon Echawr ecstatic NULL Llano blue NULL Llany suspicious NULL NULL sleeping NULL Peny NULL NULL Llandy piqued yesterday Llaisy oppressed
Nice. Although … that command line seems very complicated. Better to spend a few hours learning about zparseopts
and abstract it into a shell function:
source fillna.zsh
cat test.csv | fillna --separator , --fill-value NULL | column -s , -tx
date name status tomorrow Porthy melancholy yesterday NULL morose someday NULL NULL Mon Echawr ecstatic NULL Llano blue NULL Llany suspicious NULL NULL sleeping NULL Peny NULL NULL Llandy piqued yesterday Llaisy oppressed
Very nice. Although … this seems like a problem that other people must have encountered before.
Ah. Yes, yes it is. Better to spend a couple of days learning the DSLs for miller
, tab
, xsv
, and q
to conduct a comprehensive evaluation …