Skip to content

Instantly share code, notes, and snippets.

@montyhall
Created March 15, 2016 05:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save montyhall/60bdbf7b485157b71773 to your computer and use it in GitHub Desktop.
Save montyhall/60bdbf7b485157b71773 to your computer and use it in GitHub Desktop.
ClassPathResource resource1 = new ClassPathResource("TopicClassifier/DimMetric-Train.txt");
InputStream is = resource1.getInputStream();
LabelAwareListSentenceIterator iterator1 = new LabelAwareListSentenceIterator(is);
TokenizerFactory t = new DefaultTokenizerFactory();
t.setTokenPreProcessor(new CommonPreprocessor());
StopWatch sw = new StopWatch();
sw.start();
// ParagraphVectors training configuration
ParagraphVectors paragraphVectors = new ParagraphVectors.Builder()
.useAdaGrad(useAdaGrade)
.learningRate(learnRate)
.minLearningRate(minLearnRate)
.batchSize(batchSize)
.epochs(epochs)
.trainWordVectors(true)
.tokenizerFactory(t)
.minWordFrequency(minWordFreq)
.stopWords(stopwords)
.iterations(iterations)
.layerSize(layerSize)
.windowSize(windowSize)
.iterate(iterator1)
.build();
// Start model training
paragraphVectors.fit();
sw.stop();
log.info("...took: "+sw.getTime());
serialize(paragraphVectors);
ClassPathResource testResource = new ClassPathResource("TopicClassifier/DimMetric-Test.text");
InputStream is2 = testResource.getInputStream();
LabelAwareListSentenceIterator iterator2 = new LabelAwareListSentenceIterator(is);
while (iterator2.hasNext()) {
String nextSentence = iterator2.nextSentence();
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment