Skip to content

Instantly share code, notes, and snippets.

@bzerangue
Last active January 31, 2024 20:57
Show Gist options
  • Star 53 You must be signed in to star a gist
  • Fork 14 You must be signed in to fork a gist
  • Save bzerangue/7bf6610079659e57b8d50ecb94928c31 to your computer and use it in GitHub Desktop.
Save bzerangue/7bf6610079659e57b8d50ecb94928c31 to your computer and use it in GitHub Desktop.
JSON to NDJSON

NDJSON is a convenient format for storing or streaming structured data that may be processed one record at a time.

  • Each line is a valid JSON value
  • Line separator is ‘\n’

1. Convert JSON to NDJSON?

cat test.json | jq -c '.[]' > testNDJSON.json

With this simple line of code, you can convert and save files in NDJSON format.

Note: jq is a lightweight and flexible command-line JSON processor.
https://stedolan.github.io/jq/

Source: https://medium.com/datadriveninvestor/json-parsing-error-how-to-load-json-into-bigquery-successfully-using-ndjson-2b7d94616bcb

@dreamyguy
Copy link

Needed to get the same done in node, recursively, and this worked for me:

import fs from 'fs';
import jq from 'node-jq';

const pathInput = './export/json/';
const pathOutput = './export/ndjson/';
const fileExtensionSource = '.json';
const fileExtensionExport = '.json';

const writeSanityObjectToFileSyncAsNdjson = fileName => {
  const fileNameWithoutExtension = fileName.replace(fileExtensionSource, '');
  const fileNameOutput = `${fileNameWithoutExtension}NDJSON${fileExtensionExport}`;
  const pathOutputWithFileName = fileNameOutput.replace(pathInput, pathOutput);
  console.log(`\n`);
  jq.run('.[]', fileName, { output: 'compact' })
    .then((output) => {
      fs.writeFileSync(pathOutputWithFileName, output);
      console.log(`✨ The file '${fileName}' was converted to NDJSON!`);
    })
    .catch((err) => {
      console.error(`🐛  Something went wrong: ${err}`);
    });
  };

const jsonToNDJSON = () => {
  if (!fs.existsSync(pathInput)) {
    console.log(`dir: ${pathInput} does not exist!`);
    return;
  }
  const files = fs.readdirSync(pathInput);
  files.forEach((fileName) => {
    if (fileName !== '.DS_Store') {
      const pathInputWithFileName = `${pathInput}${fileName}`;
      const stat = fs.lstatSync(pathInputWithFileName);
      const regex = new RegExp(`([\\s]*?)${fileExtensionSource}`, "gi");
      if (!stat.isDirectory() && regex.test(pathInputWithFileName)) {
        writeSanityObjectToFileSyncAsNdjson(pathInputWithFileName);
      };
    }
  });
};

jsonToNDJSON();

@m9aertner
Copy link

Useful as a stepping stone for creating input data for Elasticsearch bulk API.
Concrete example:

$ jq -c '.a | .[]' <<END
{
    "a": [
        {
            "a1": 1
        },
        {
            "a2": 2
        }
    ]
}
END

Output:

{"a1":1}
{"a2":2}

@UweW
Copy link

UweW commented Nov 4, 2021

have even an issue with my json data for elastic.
My challenge is that I need something simular to @m9aertner example, but my nesting goes one level deeper.

{
   "x": {
        "a": [
            {
                "a1": 1
            },
            {
                "a2": 2
            }
        ]
    }
}

should result in:

{"x":{"a1":1}}
{"x":{"a2":2}}

@m9aertner
Copy link

@UweW try

jq -c 'to_entries[] | { (.key) : (.value | .[] | .[]) }' <<<'{ "x": { "a": [ { "a1": 1 }, { "a2": 2 } ] } }'
{"x":{"a1":1}}
{"x":{"a2":2}}

@draxil
Copy link

draxil commented Aug 9, 2022

jq can choke on very large files, and be slow. For these situations I made json2nd.

@bzerangue
Copy link
Author

jq can choke on very large files, and be slow. For these situations I made json2nd.

Thanks for sharing @draxil !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment