Skip to content

Instantly share code, notes, and snippets.

@PolMine
Created August 16, 2018 15:51
Show Gist options
  • Save PolMine/5fa4a87fdbf89af6f1ac76b695a31f83 to your computer and use it in GitHub Desktop.
Save PolMine/5fa4a87fdbf89af6f1ac76b695a31f83 to your computer and use it in GitHub Desktop.
Get cooccurrence similarity
library(cooccurrences)
library(pbapply)
library(coop)
issues <- df %>% unlist() %>% unname() %>% as.character() %>% unique()
dt <- count("GERMAPARL", issues) %>%
setkeyv("count") %>% setorderv(cols = "count", order = -1L)
issues_min <- dt[count > 100][["query"]]
issues_min <- iconv(issues_min, from = "latin1", to = "UTF-8")
li <- pblapply(
issues_min,
function(x) cooccurrences("GERMAPARL", query = x, left = 15, right = 15)
)
bu <- as.bundle(li)
tdm <- as.TermDocumentMatrix(bu, col = "ll")
colnames(tdm) <- issues_min
m <- as.matrix(tdm)
cosim <- coop::cosine(m)
d <- proxy::pr_simil2dist(cosim)
y <- cmdscale(d = d, k = 2)
plot(x = y[,1], y = y[,2], type = "n")
text(x = y[,1], y = y[,2], labels = rownames(y), cex = 0.5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment