Skip to content

Instantly share code, notes, and snippets.

@tmastny
Created May 26, 2020 23:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tmastny/cd2f3c507f31264979eec7e4ae9f339c to your computer and use it in GitHub Desktop.
Save tmastny/cd2f3c507f31264979eec7e4ae9f339c to your computer and use it in GitHub Desktop.
library(dplyr)
library(janeaustenr)
library(tidytext)
library(tidylo)
library(ggplot2)
tidy_bigrams <- austen_books() %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2)
# bigrams across books
tidy_bigrams
bigram_counts <- tidy_bigrams %>%
count(book, bigram, sort = TRUE)
bigram_counts
bigram_log_odds <- bigram_counts %>%
bind_log_odds(book, bigram, n)
bigram_log_odds
bigram_log_odds %>%
group_by(book) %>%
slice_max(log_odds_weighted, n = 10) %>%
ungroup() %>%
mutate(bigram = reorder(bigram, log_odds_weighted)) %>%
ggplot(aes(bigram, log_odds_weighted, fill = book)) +
geom_col(show.legend = FALSE) +
facet_wrap(~book, scales = "free") +
coord_flip() +
labs(x = NULL)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment