Skip to content

Instantly share code, notes, and snippets.

@MattSandy
Created September 3, 2019 15:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MattSandy/030b64449ca89ac60297cda62378f0cf to your computer and use it in GitHub Desktop.
Save MattSandy/030b64449ca89ac60297cda62378f0cf to your computer and use it in GitHub Desktop.
Gets the subreddits that made it to the front page the most often
library(tidyverse)
library(jsonlite)
library(plotly)
# Figure out which columns are lists
drop_lists <- function(df) {
for(column in names(df)) {
if(typeof(df[[column]])=="list") {
# Remove column
df[[column]] <- NULL
}
}
return(df)
}
# Get posts from front page
posts <- list() # empty list
after <- "" # empty character string
for(i in 1:40) {
url <- paste0("https://www.reddit.com/.json?after=",after)
result <- fromJSON(url)
# Column 'edited' was throwing an error
# Error: Column `edited` can't be converted from numeric to logical
# Might as well just drop that column too
posts[[i]] <- result$data$children$data %>%
drop_lists %>% select(-edited)
after <- posts[[i]]$name %>% tail(1)
print(url)
}
df <- posts %>% bind_rows
# Clean up workspace
rm(list=c("after","result","url","i"))
# Top 10 subreddits to make it to the front page
# Possible targets
top_10 <- table(Subreddit = df$subreddit) %>%
data.frame %>% top_n(n = 10, wt = Freq)
# Note: top_10 may return more than the results...
# ...if Freq of nth subreddit is the same across multiple subs
print(top_10)
# Get character length of titles
p1 <- ggplot(df %>% filter(subreddit %in% top_10$Subreddit),
aes(x = score, color = subreddit)) +
geom_density() + scale_x_log10() + scale_color_tableau()
p1
# Lots of colors are hard to differentiate
# This makes it easier because you can hover
ggplotly(p1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment