Skip to content

Instantly share code, notes, and snippets.

@DGrady
Last active March 11, 2019 19:10
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DGrady/c80fae1398ffaad0c567beffd258db9f to your computer and use it in GitHub Desktop.
Save DGrady/c80fae1398ffaad0c567beffd258db9f to your computer and use it in GitHub Desktop.
Pretty printing delimited text files at the command line

Pretty printing delimited text files at the command line

Sometimes, you’d like to look at delimited files on the command line:

cat test.csv
date,name,status
tomorrow,Porthy,melancholy
yesterday,,morose
someday,,
Mon,Echawr,ecstatic
,Llano,blue
,Llany,suspicious
,,sleeping
,Peny,
,Llandy,piqued
yesterday,Llaisy,oppressed

This is often annoying. Because the columns don’t line up, it’s hard to see what values go with what column. Fortunately, there’s a fantastic command line utility that will make aligned tables for you:

cat test.csv | column -s , -tx
date       name        status
tomorrow   Porthy      melancholy
yesterday  morose
someday
Mon        Echawr      ecstatic
Llano      blue
Llany      suspicious
sleeping
Peny
Llandy     piqued
yesterday  Llaisy      oppressed

Oh. Except column doesn’t understand how to handle empty fields. So this is actually worse than before, because now we have a result that looks like it’s correctly aligned, but actually is not. column silently fails. (Although I have since discovered that this behavior of column was a bug, and it has been fixed in at least some distributions.)

I suppose we could fix this by just using sed to add some sort of ‘this field was empty’ value to the empty fields, and then columnate:

cat test.csv | sed "s/,,/,NULL,/g" | column -s , -tx
date       name        status
tomorrow   Porthy      melancholy
yesterday  NULL        morose
someday    NULL
Mon        Echawr      ecstatic
Llano      blue
Llany      suspicious
NULL       sleeping
Peny
Llandy     piqued
yesterday  Llaisy      oppressed

Oh. That almost works, but we forgot to consider all the possible cases. We need to worry about leading empty fields (beginning of line followed by delimiter), consecutive internal empty fields (delimiter delimiter delimiter), and trailing empty fields (delimiter followed by newline). Like this:

cat test.csv | sed "s/,,/,NULL,/g" | sed "s/,,/,NULL,/g" | sed "s/^,/NULL,/g" | sed "s/,$/,NULL/g" | column -s , -tx
date       name    status
tomorrow   Porthy  melancholy
yesterday  NULL    morose
someday    NULL    NULL
Mon        Echawr  ecstatic
NULL       Llano   blue
NULL       Llany   suspicious
NULL       NULL    sleeping
NULL       Peny    NULL
NULL       Llandy  piqued
yesterday  Llaisy  oppressed

Nice. Although … that command line seems very complicated. Better to spend a few hours learning about zparseopts and abstract it into a shell function:

source fillna.zsh
cat test.csv | fillna --separator , --fill-value NULL | column -s , -tx
date       name    status
tomorrow   Porthy  melancholy
yesterday  NULL    morose
someday    NULL    NULL
Mon        Echawr  ecstatic
NULL       Llano   blue
NULL       Llany   suspicious
NULL       NULL    sleeping
NULL       Peny    NULL
NULL       Llandy  piqued
yesterday  Llaisy  oppressed

Very nice. Although … this seems like a problem that other people must have encountered before.

Ah. Yes, yes it is. Better to spend a couple of days learning the DSLs for miller, tab, xsv, and q to conduct a comprehensive evaluation …

function fillna {
# Make `opts` an associative array and set default values. `local` is the
# same as `typeset`; this also makes the parameter local to this function.
typeset -A opts=( -s , -v NULL )
# Parse options. See `man 1 zshmodules` for more details; `zparseopts` is at
# the very end. `-K` means to preserve the values in the target array if
# they don’t appear in the positional parameters; `-M` allows us to set up
# mappings so that `-separator:=s` will put the `--separator` option into
# whatever location the `-s` option uses; `-A opts` provides the associative
# (-A) array to store the options in. Colons indicate the option takes a
# mandatory argument.
zparseopts -K -M -A opts -- s: -separator:=s v: -fill-value:=v
sed "s/${opts[-s]}${opts[-s]}/${opts[-s]}${opts[-v]}${opts[-s]}/g" \
| sed "s/${opts[-s]}${opts[-s]}/${opts[-s]}${opts[-v]}${opts[-s]}/g" \
| sed "s/^${opts[-s]}/${opts[-v]},/g" \
| sed "s/${opts[-s]}$/${opts[-s]}${opts[-v]}/g"
}
date name status
tomorrow Porthy melancholy
yesterday morose
someday
Mon Echawr ecstatic
Llano blue
Llany suspicious
sleeping
Peny
Llandy piqued
yesterday Llaisy oppressed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment