Skip to content

Instantly share code, notes, and snippets.

@loretoparisi
Created July 26, 2017 13:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save loretoparisi/cdb73372c51102d62743ccf5029a86d8 to your computer and use it in GitHub Desktop.
Save loretoparisi/cdb73372c51102d62743ccf5029a86d8 to your computer and use it in GitHub Desktop.
DeepLearning4J Vocabulary construction
12:55:05.736 [main] DEBUG o.d.m.w.wordstore.VocabConstructor - Target vocab size before building: [0]
12:55:05.738 [main] DEBUG o.d.m.w.wordstore.VocabConstructor - Trying source iterator: [0]
12:55:05.738 [main] DEBUG o.d.m.w.wordstore.VocabConstructor - Target vocab size before building: [0]
12:55:05.809 [main] DEBUG o.d.m.w.wordstore.VocabConstructor - Wating till all processes stop...
12:55:05.810 [main] DEBUG o.d.m.w.wordstore.VocabConstructor - Vocab size before truncation: [727], NumWords: [16323], sequences parsed: [41], counter: [16323]
12:55:05.812 [main] DEBUG o.d.m.w.wordstore.VocabConstructor - Scavenger: Words before: 727; Words after: 727;
12:55:05.812 [main] DEBUG o.d.m.w.wordstore.VocabConstructor - Vocab size after truncation: [727], NumWords: [16323], sequences parsed: [41], counter: [16323]
12:55:08.208 [main] INFO o.d.m.w.wordstore.VocabConstructor - Sequences checked: [41], Current vocabulary size: [727]; Sequences/sec: [16.59];
@loretoparisi
Copy link
Author

TfidfVectorizer tfidfVectorizer = new TfidfVectorizer.Builder()
                    .setIterator(new CollectionSentenceIterator(sentence))
                    .setTokenizerFactory(this.tokenizer)
                    .setMinWordFrequency(1)
                    .allowParallelTokenization(false)
                    .build();
            tfidfVectorizer.fit();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment