Skip to content

Instantly share code, notes, and snippets.

@PolMine
Created November 11, 2019 08:58
Show Gist options
  • Save PolMine/a3dd727f3bfec24f0918d7ebe3de8033 to your computer and use it in GitHub Desktop.
Save PolMine/a3dd727f3bfec24f0918d7ebe3de8033 to your computer and use it in GitHub Desktop.
word2vec workflow with polmineR
# This code, which can be adapted easily, can be used to train a word2vec model easily. Note that it
# relies on the package [wordVectors](https://github.com/bmschmidt/wordVectors).
library(wordVectors)
file_out <- "~/Lab/tmp/germaparl.txt"
vectors_bin <- "~/Lab/tmp/germaparl.bin"
.fn <- function(x){
txt <- stringr::str_c(x, collapse = " ")
readr::write_lines(txt, file_out, append = TRUE)
}
corpus("GERMAPARL") %>%
split(s_attribute = "speech_id") %>%
get_token_stream(p_attribute = "word") %>%
lapply(.fn)
train_word2vec(file_out, vectors_bin, vectors = 200, threads = 7, window = 12, iter = 5, negative_samples = 0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment