Skip to content

Instantly share code, notes, and snippets.

@sAbakumoff
Last active September 18, 2016 05:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sAbakumoff/279244fd5f5f93458c6b129097003ed8 to your computer and use it in GitHub Desktop.
Save sAbakumoff/279244fd5f5f93458c6b129097003ed8 to your computer and use it in GitHub Desktop.
#step 1 : remove rows with merge, pull, request words as they useless for analysis
custom_stop_words<-data.frame("word"=c("merge", "pull", "request"))
tidy_commits<-tidy_commits %>% anti_join(custom_stop_words)
#step 2: calculate tf-idf and sort the rows by its value in descending order
commit_words<-tidy_commits %>%
count(name, word, sort=TRUE) %>%
ungroup() %>%
bind_tf_idf(word, name, n) %>%
arrange(desc(tf_idf)) %>%
mutate(word = reorder(word, tf_idf))
#step 3 : plot the bar chart
commit_words %>%
filter(name=="react") %>% top_n(20) %>% ggplot(aes(word, tf_idf)) +
geom_bar(stat = "identity", fill=I("#53d2fa")) +
labs(y="tf-idf", title="Highest tf-idf words in React Commit Messages") +
coord_flip()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment