Skip to content

Instantly share code, notes, and snippets.

@takageymt
Created June 12, 2018 17:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save takageymt/f2f9cd8880aca2643d7f18042563a631 to your computer and use it in GitHub Desktop.
Save takageymt/f2f9cd8880aca2643d7f18042563a631 to your computer and use it in GitHub Desktop.
std::vector<Word> get_all_words_in(const std::vector<Page>& pages) {
std::vector<Word> words;
std::map<std::string, int> word_conv;
int num_words = 0;
for(const Page& page : pages) {
std::vector<std::string> raw_words = filter_with_regex(get_words_in(page.url()), ".*\\.html$", false);
for(int i = 0; i < static_cast<int>(raw_words.size()); ++i) {
if(!word_conv.count(raw_words[i])) {
word_conv[raw_words[i]] = num_words++;
words.emplace_back(word_conv[raw_words[i]], raw_words[i]);
}
words[word_conv[raw_words[i]]].is_located_on(page.id(), i);
}
}
return words;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment