Skip to content

Instantly share code, notes, and snippets.

@mrm1001
Created August 11, 2018 15:42
Show Gist options
  • Save mrm1001/48a8de8806231b6ad6bb439222a024f1 to your computer and use it in GitHub Desktop.
Save mrm1001/48a8de8806231b6ad6bb439222a024f1 to your computer and use it in GitHub Desktop.
FastText
void Dictionary::readFromFile(std::istream& in) {
std::string word;
int64_t minThreshold = 1;
while (readWord(in, word)) {
add(word);
if (ntokens_ % 1000000 == 0 && args_->verbose > 1) {
std::cerr << "\rRead " << ntokens_ / 1000000 << "M words" << std::flush;
}
if (size_ > 0.75 * MAX_VOCAB_SIZE) {
minThreshold++;
threshold(minThreshold, minThreshold);
}
}
threshold(args_->minCount, args_->minCountLabel);
initTableDiscard();
initNgrams();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment