Skip to content

Instantly share code, notes, and snippets.

@ceving
Last active January 26, 2024 08:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ceving/f57eb745a0eca7e168625a05a2b4cd73 to your computer and use it in GitHub Desktop.
Save ceving/f57eb745a0eca7e168625a05a2b4cd73 to your computer and use it in GitHub Desktop.
Processing tabular data with JQ instead of AWK

Processing tabular data with JQ instead of AWK

The following produces a table with two columns: a date and a file name.

$ find /etc/network -type f -printf '%T@\t%p\n'
1670618223.0000000000   /etc/network/if-down.d/resolved
1689161114.2422386520   /etc/network/interfaces
1673911860.0000000000   /etc/network/if-up.d/ntpsec-ntpdate
1670618223.0000000000   /etc/network/if-up.d/resolved

Normally AWK is used to process such data. But it is also possible to process the data with JQ. The following will read raw input in order to split it twice: frist the raw input is split into lines and second each line is split into columns. The result is a stream of arrays.

$ find /etc/network -type f -printf '%T@\t%p\n' |
> jq -R 'split("\t")' 
[
  "1670618223.0000000000",
  "/etc/network/if-down.d/resolved"
]
[
  "1689161114.2422386520",
  "/etc/network/interfaces"
]
[
  "1673911860.0000000000",
  "/etc/network/if-up.d/ntpsec-ntpdate"
]
[
  "1670618223.0000000000",
  "/etc/network/if-up.d/resolved"
]

Often it is usefull to give each column a semantic meaning by converting the stream of arrays into a stream of objects. In this step it can also be useful to convert the data to native value types.

$ find /etc/network -type f -printf '%T@\t%p\n' |
> jq -R 'split("\t")' |
> jq '{date:.[0]|tonumber,name:.[1]}'
{
  "date": 1670618223,
  "name": "/etc/network/if-down.d/resolved"
}
{
  "date": 1689161114.2422388,
  "name": "/etc/network/interfaces"
}
{
  "date": 1673911860,
  "name": "/etc/network/if-up.d/ntpsec-ntpdate"
}
{
  "date": 1670618223,
  "name": "/etc/network/if-up.d/resolved"
}

Some operations of JQ can not work on a stream. This makes it necessary to "slurp" the input stream into an array.

$ find /etc/network -type f -printf '%T@\t%p\n' |
> jq -R 'split("\t")' |
> jq '{date:.[0]|tonumber,name:.[1]}' |
> jq -s 'sort_by(.date)' 
[
  {
    "date": 1670618223,
    "name": "/etc/network/if-down.d/resolved"
  },
  {
    "date": 1670618223,
    "name": "/etc/network/if-up.d/resolved"
  },
  {
    "date": 1673911860,
    "name": "/etc/network/if-up.d/ntpsec-ntpdate"
  },
  {
    "date": 1689161114.2422388,
    "name": "/etc/network/interfaces"
  }
]

Finally the processed data can be converted back into the original format by generating raw strings.

$ find /etc/network -type f -printf '%T@\t%p\n' |
> jq -R 'split("\t")' |
> jq '{date:.[0]|tonumber,name:.[1]}' |
> jq -s 'sort_by(.date)' |
> jq -r '.[]|"\(.date)\t\(.name)"'
1670618223      /etc/network/if-down.d/resolved
1670618223      /etc/network/if-up.d/resolved
1673911860      /etc/network/if-up.d/ntpsec-ntpdate
1689161114.2422388      /etc/network/interfaces

The value conversion has one drawback: it might be impossible to produce the original values, because the text formating functions of JQ are limited.

If it is necessary to keep exactly the original input it has to be merged with the parsed values:

$ find /etc/network -type f -printf '%T@\t%p\n' |
> jq -R '{line:.,columns:split("\t")|{date:.[0]|tonumber,name:.[1]}}'
{
  "line": "1670618223.0000000000\t/etc/network/if-down.d/resolved",
  "columns": {
    "date": 1670618223,
    "name": "/etc/network/if-down.d/resolved"
  }
}
{
  "line": "1689161114.2422386520\t/etc/network/interfaces",
  "columns": {
    "date": 1689161114.2422388,
    "name": "/etc/network/interfaces"
  }
}
{
  "line": "1673911860.0000000000\t/etc/network/if-up.d/ntpsec-ntpdate",
  "columns": {
    "date": 1673911860,
    "name": "/etc/network/if-up.d/ntpsec-ntpdate"
  }
}
{
  "line": "1670618223.0000000000\t/etc/network/if-up.d/resolved",
  "columns": {
    "date": 1670618223,
    "name": "/etc/network/if-up.d/resolved"
  }
}

Now the data can be processed as in the first example. But instead of producing a new output the preserved input can be returned.

$ find /etc/network -type f -printf '%T@\t%p\n' |
> jq -R '{line:.,columns:split("\t")|{date:.[0]|tonumber,name:.[1]}}' |
> jq -s 'sort_by(.columns.date)' |
> jq -r '.[].line'
1670618223.0000000000   /etc/network/if-down.d/resolved
1670618223.0000000000   /etc/network/if-up.d/resolved
1673911860.0000000000   /etc/network/if-up.d/ntpsec-ntpdate
1689161114.2422386520   /etc/network/interfaces

Each data processing with JQ has exactly three steps:

  1. Read raw data and split the input (jq -R ...).
  2. Process the data with (jq -s ...) or without slurping (jq ...).
  3. Write raw data (jq -r ...).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment