ceving/Processing tabular data with JQ instead of AWK.md

## Processing tabular data with JQ instead of AWK.md

      
    Raw
  

              Processing tabular data with JQ instead of AWK.md
            
          
    Processing tabular data with JQ instead of AWK

The following produces a table with two columns: a date and a file name.
$ find /etc/network -type f -printf '%T@\t%p\n'
1670618223.0000000000   /etc/network/if-down.d/resolved
1689161114.2422386520   /etc/network/interfaces
1673911860.0000000000   /etc/network/if-up.d/ntpsec-ntpdate
1670618223.0000000000   /etc/network/if-up.d/resolved

Normally AWK is used to process such data. But it is also possible to process
the data with JQ. The following will read raw input in order to split it twice:
frist the raw input is split into lines and second each line is split into
columns. The result is a stream of arrays.
$ find /etc/network -type f -printf '%T@\t%p\n' |
> jq -R 'split("\t")' 
[
  "1670618223.0000000000",
  "/etc/network/if-down.d/resolved"
]
[
  "1689161114.2422386520",
  "/etc/network/interfaces"
]
[
  "1673911860.0000000000",
  "/etc/network/if-up.d/ntpsec-ntpdate"
]
[
  "1670618223.0000000000",
  "/etc/network/if-up.d/resolved"
]

Often it is usefull to give each column a semantic meaning by converting
the stream of arrays into a stream of objects. In this step it can also be
useful to convert the data to native value types.
$ find /etc/network -type f -printf '%T@\t%p\n' |
> jq -R 'split("\t")' |
> jq '{date:.[0]|tonumber,name:.[1]}'
{
  "date": 1670618223,
  "name": "/etc/network/if-down.d/resolved"
}
{
  "date": 1689161114.2422388,
  "name": "/etc/network/interfaces"
}
{
  "date": 1673911860,
  "name": "/etc/network/if-up.d/ntpsec-ntpdate"
}
{
  "date": 1670618223,
  "name": "/etc/network/if-up.d/resolved"
}

Some operations of JQ can not work on a stream. This makes it necessary to
"slurp" the input stream into an array.
$ find /etc/network -type f -printf '%T@\t%p\n' |
> jq -R 'split("\t")' |
> jq '{date:.[0]|tonumber,name:.[1]}' |
> jq -s 'sort_by(.date)' 
[
  {
    "date": 1670618223,
    "name": "/etc/network/if-down.d/resolved"
  },
  {
    "date": 1670618223,
    "name": "/etc/network/if-up.d/resolved"
  },
  {
    "date": 1673911860,
    "name": "/etc/network/if-up.d/ntpsec-ntpdate"
  },
  {
    "date": 1689161114.2422388,
    "name": "/etc/network/interfaces"
  }
]

Finally the processed data can be converted back into the original format
by generating raw strings.
$ find /etc/network -type f -printf '%T@\t%p\n' |
> jq -R 'split("\t")' |
> jq '{date:.[0]|tonumber,name:.[1]}' |
> jq -s 'sort_by(.date)' |
> jq -r '.[]|"\(.date)\t\(.name)"'
1670618223      /etc/network/if-down.d/resolved
1670618223      /etc/network/if-up.d/resolved
1673911860      /etc/network/if-up.d/ntpsec-ntpdate
1689161114.2422388      /etc/network/interfaces

The value conversion has one drawback: it might be impossible to
produce the original values, because the text formating functions
of JQ are limited.
If it is necessary to keep exactly the original input it has to be
merged with the parsed values:
$ find /etc/network -type f -printf '%T@\t%p\n' |
> jq -R '{line:.,columns:split("\t")|{date:.[0]|tonumber,name:.[1]}}'
{
  "line": "1670618223.0000000000\t/etc/network/if-down.d/resolved",
  "columns": {
    "date": 1670618223,
    "name": "/etc/network/if-down.d/resolved"
  }
}
{
  "line": "1689161114.2422386520\t/etc/network/interfaces",
  "columns": {
    "date": 1689161114.2422388,
    "name": "/etc/network/interfaces"
  }
}
{
  "line": "1673911860.0000000000\t/etc/network/if-up.d/ntpsec-ntpdate",
  "columns": {
    "date": 1673911860,
    "name": "/etc/network/if-up.d/ntpsec-ntpdate"
  }
}
{
  "line": "1670618223.0000000000\t/etc/network/if-up.d/resolved",
  "columns": {
    "date": 1670618223,
    "name": "/etc/network/if-up.d/resolved"
  }
}

Now the data can be processed as in the first example. But instead
of producing a new output the preserved input can be returned.
$ find /etc/network -type f -printf '%T@\t%p\n' |
> jq -R '{line:.,columns:split("\t")|{date:.[0]|tonumber,name:.[1]}}' |
> jq -s 'sort_by(.columns.date)' |
> jq -r '.[].line'
1670618223.0000000000   /etc/network/if-down.d/resolved
1670618223.0000000000   /etc/network/if-up.d/resolved
1673911860.0000000000   /etc/network/if-up.d/ntpsec-ntpdate
1689161114.2422386520   /etc/network/interfaces

Each data processing with JQ has exactly three steps:

Read raw data and split the input (jq -R ...).
Process the data with (jq -s ...) or without slurping (jq ...).
Write raw data (jq -r ...).