Skip to content

Instantly share code, notes, and snippets.

@rubykv
Created August 8, 2022 00:37
Show Gist options
  • Save rubykv/86fb3e0089767417daad7ad73135b0a7 to your computer and use it in GitHub Desktop.
Save rubykv/86fb3e0089767417daad7ad73135b0a7 to your computer and use it in GitHub Desktop.
public static void generateModel() throws IOException {
File file = ResourceUtils.getFile("src/main/resources/training-data.txt");
InputStreamFactory inputStreamFactory = new MarkableFileInputStreamFactory(file);
ObjectStream<String> lineStream = new PlainTextByLineStream(inputStreamFactory, "UTF-8");
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
int minNgramSize = 1;
int maxNgramSize = 6;
DoccatFactory customFactory = new DoccatFactory(new FeatureGenerator[] { new BagOfWordsFeatureGenerator(),
new NGramFeatureGenerator(minNgramSize, maxNgramSize) });
TrainingParameters mlParams = new TrainingParameters();
mlParams.put(TrainingParameters.ALGORITHM_PARAM, "MAXENT");
mlParams.put(TrainingParameters.TRAINER_TYPE_PARAM, EventTrainer.EVENT_VALUE);
mlParams.put(TrainingParameters.ITERATIONS_PARAM, 10);
mlParams.put(TrainingParameters.CUTOFF_PARAM, 3);
DoccatModel model = DocumentCategorizerME.train("en", sampleStream, mlParams, customFactory);
try (OutputStream modelOut = new BufferedOutputStream(
new FileOutputStream("src/main/resources/nlp-model/en-trained-model.bin"))) {
model.serialize(modelOut);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment