Skip to content

Instantly share code, notes, and snippets.

@seaslee
Last active December 24, 2015 21:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save seaslee/6865565 to your computer and use it in GitHub Desktop.
Save seaslee/6865565 to your computer and use it in GitHub Desktop.
library(tm)
## read from txt file
path = 'd:/sigir_full.txt'
f <- file(path,open='rt')
con <- readLines(f)
close(f)
## get the tile from the content
paperTitles <- con[grepl("^Title: ",con)]
paperTitles <- lapply(paperTitles,function(x) strsplit(x,':')[[1]][2])
##TermDocMatrix
paperTitlesCorpus <- Corpus(VectorSource(paperTitles))
ptTermMatrix <- TermDocumentMatrix(paperTitlesCorpus,
control=list(
stopwords=T,
removePunctuation=T,
removeNumbers=T))
print(findFreqTerms(ptTermMatrix,4))
##word clound
m <- as.matrix(ptTermMatrix)
v <- sort(rowSums(m), decreasing=TRUE)
myNames <- names(v)
d <- data.frame(word=myNames, freq=v)
par(mar = rep(2, 4))
png(paste(getwd(), "/sigir13", ".png", sep = ''),
width=10, height=10,
units="in", res=700)
pal2 <- brewer.pal(8,"Dark2")
wordcloud(d$word,d$freq, scale=c(5,.2), min.freq=mean(d$freq),
max.words=80, random.order=FALSE, rot.per=.15, colors=pal2)
dev.off()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment