Skip to content

Instantly share code, notes, and snippets.

@suryadutta
Created November 14, 2017 03:23
Show Gist options
  • Save suryadutta/a3b6688b625349b600ce866840f3e780 to your computer and use it in GitHub Desktop.
Save suryadutta/a3b6688b625349b600ce866840f3e780 to your computer and use it in GitHub Desktop.
reduce size of DTM to make computation faster
reduceDTM <- function(dtm){
term_tfidf <-
tapply(dtm$v/row_sums(dtm)[dtm$i], dtm$j, mean) *
log2(nDocs(dtm)/col_sums(dtm > 0))
dtm <- dtm[,term_tfidf >= median(term_tfidf)]
dtm <- dtm[row_sums(dtm) > 0,]
return(dtm)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment