Skip to content

Instantly share code, notes, and snippets.

@vsimko
Last active November 21, 2017 16:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save vsimko/1039e15593f3ed34a41e1f2ca13d67be to your computer and use it in GitHub Desktop.
Save vsimko/1039e15593f3ed34a41e1f2ca13d67be to your computer and use it in GitHub Desktop.
library(lsa)
# create some sample data in temp dir
td = tempfile()
dir.create(td)
write(c("dog", "cat", "mouse"), file = paste(td, "D1", sep = "/"))
write(c("ham", "mouse", "sushi"), file = paste(td, "D2", sep = "/"))
write(c("dog", "pet", "pet"), file = paste(td, "D3", sep = "/"))
data(stopwords_en)
myMatrix0 <- textmatrix(td, stopwords = stopwords_en)
unlink(td, recursive = TRUE) # temp dir not needed anymore
# normalize using term frequency (tf) and inverse document frequency (idf)
myMatrix <- lw_logtf(myMatrix0) * gw_idf(myMatrix0)
myLSAspace <- lsa(myMatrix, dims = 2) # use just two dimensions for rendering
# plot documents in red and words in blue
plot(rbind(myLSAspace$tk, myLSAspace$dk), type = "n")
text(myLSAspace$dk, labels = rownames(myLSAspace$dk), col = "red")
text(myLSAspace$tk, labels = rownames(myLSAspace$tk), col = "blue")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment