Skip to content

Instantly share code, notes, and snippets.

@Hugo-ter-Doest
Last active January 13, 2019 20:49
Show Gist options
  • Save Hugo-ter-Doest/cbb6f4ce96de0051b0d2264f617a60bc to your computer and use it in GitHub Desktop.
Save Hugo-ter-Doest/cbb6f4ce96de0051b0d2264f617a60bc to your computer and use it in GitHub Desktop.
Converts a flat Brown corpus to an object consisting of tagged sentences
var fs = require('fs');
var inputFile = './spec/test_data/browntag_nolines_excerpt.txt';
var outputFile = './spec/test_data/browntag_nolines_excerpt.json';;
var Corpus = require('../../lib/natural/brill_pos_tagger/lib/Corpus');
var SentenceClass = require('../../lib/natural/brill_pos_tagger/lib/Sentence');
var data = fs.readFileSync(inputFile, 'utf8');
var corpus = new Corpus(data, 1, SentenceClass);
fs.writeFileSync(outputFile, JSON.stringify(corpus, null, 2));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment