Skip to content

Instantly share code, notes, and snippets.

@nassimhaddad
Last active August 29, 2015 14:12
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nassimhaddad/58933f4a5d34b84f4099 to your computer and use it in GitHub Desktop.
Save nassimhaddad/58933f4a5d34b84f4099 to your computer and use it in GitHub Desktop.
Minimal example of how to load and use word vectors trained with nlp.stanford's GloVe (http://nlp.stanford.edu/projects/glove/), a text mining methodology similar to word2vec.
# download the trained word vectors (~100mb)
download_to <- tempfile()
download.file('http://www-nlp.stanford.edu/data/glove.6B.50d.txt.gz',
download_to)
# prepare the data
data <- read.table(download_to, sep = " ", header = FALSE,
quote = NULL, comment.char = "", row.names = 1,
nrows = -1)
data <- as.matrix(data)
if (!(require(FNN))){
install.packages("FNN")
require(FNN)
}
allwords <- row.names(data)
# quick function
get_closest <- function(x, k = 10){
knns <- get.knnx(data, t(x), k=k)
data.frame(words = allwords[knns$nn.index],
dist = as.vector(knns$nn.dist))
}
# find closest words
get_closest(data["wine",])
# arithmetic based on words
comp <- data["king",] - data["son",] + data["daughter",]
get_closest(comp)
# cleanup: delete the temporary file
file.remove(download_to)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment