Skip to content

Instantly share code, notes, and snippets.

@pasqLisena
Created February 1, 2019 10:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pasqLisena/e00f4dfb60744da311c2740adaf5902c to your computer and use it in GitHub Desktop.
Save pasqLisena/e00f4dfb60744da311c2740adaf5902c to your computer and use it in GitHub Desktop.
Brute force turtle files splitter
/*
* Brute force turtle files splitter
* npm install argparse fs-extra
* node index.js --input /Users/pasquale/Desktop/aat/all/redomi_part1.ttl --output ./out
*/
const fs = require('fs-extra');
const { ArgumentParser } = require('argparse');
// setup arguments
const parser = new ArgumentParser({
version: '0.0.1',
addHelp: true,
description: 'Turtle splitter',
});
parser.addArgument(
['-i', '--input'],
{
required: true,
help: 'Input turtle (.ttl) file.',
},
);
parser.addArgument(
['-o', '--output'],
{
defaultValue: './out',
help: 'Output folder',
},
);
parser.addArgument(
['-l', '--maxlines'],
{
type: Number,
defaultValue: 10000,
help: 'Max number of lines per output file.',
},
);
const args = parser.parseArgs();
const txt = fs.readFileSync(args.input, 'utf8');
fs.ensureDirSync(args.output);
const lines = txt.split('\n').reverse();
let line = lines.pop();
const pfx = [];
while (line.startsWith('@prefix')) {
pfx.push(line);
line = lines.pop();
}
const prefixes = `${pfx.join('\n')}\n\n`;
let i = 0; let
j = 0;
let curtxt = prefixes;
while (lines.length) {
line = lines.pop();
i++;
curtxt += `${line}\n`;
if (i >= args.maxlines && !line.trim()) {
// save file
fs.writeFileSync(`${args.output}/${j++}.ttl`, curtxt, 'utf-8');
i = 0;
curtxt = prefixes;
}
}
fs.writeFileSync(`${args.output}/${j++}.ttl`, curtxt, 'utf-8');
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment