Skip to content

Instantly share code, notes, and snippets.

@grimbough
Created April 13, 2022 13:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save grimbough/2d684b42ab4b20a59b92e8851c6c883b to your computer and use it in GitHub Desktop.
Save grimbough/2d684b42ab4b20a59b92e8851c6c883b to your computer and use it in GitHub Desktop.
Plotting download counts of various R versions from R Studio CRAN mirror
library(tidyverse)
library(lubridate)
library(scales)
## generate names for log files
start <- as.Date('2013-01-01')
today <- as.Date('2022-04-10')
all_days <- seq(start, today, by = 'day')
year <- as.POSIXlt(all_days)$year + 1900
## place to store the log files
log_dir <- file.path(tempdir(), "cran_logs")
dir.create(log_dir)
## sometimes I get a failures in the download step, this allows us
## to only re-try the missing log files
missing_days <- setdiff(as.character(all_days), strtrim(tools::file_path_sans_ext(list.files(log_dir), TRUE), 10))
if(length(missing_days)) {
urls <- paste0('http://cran-logs.rstudio.com/', as.POSIXlt(missing_days)$year + 1900, '/', missing_days, '-r.csv.gz')
} else {
urls <- paste0('http://cran-logs.rstudio.com/', year, '/', all_days, '-r.csv.gz')
}
## download the files - sometimes I get a failure, so repeat steps if needed.
download.file(urls, destfile = file.path(log_dir, basename(urls)), quiet = TRUE)
## read all files
logs <- readr::read_csv(file = list.files(log_dir, pattern = ".csv.gz$", full.names = TRUE),
col_types = "Dtdccci",
progress = FALSE)
downloads <- logs %>%
## trucate version to whatever comes before the first period '.'
mutate(version = gsub(version, pattern = "^([[:alnum:]]*).*", replacement = "\\1")) %>%
group_by(month = floor_date(date, unit = "month"), version) %>%
summarise(downloads = n())
## calculate a 14 day rolling mean for proportion of unique IP addresses
ips <- group_by(logs, date) %>%
summarise(prop_uniq = length(unique(ip_id)) / n()) %>%
mutate(prop_uniq_s = RcppRoll::roll_meanr(prop_uniq, 14))
## create plot
ggplot() +
geom_bar(data = downloads,
aes(x = month, y = downloads, fill = version),
stat = "identity") +
geom_line(data = ips,
aes(y = prop_uniq_s * 1e6, x = date),
alpha = 0.5, lwd = 1.5) +
scale_x_date(name = "Date") +
scale_y_continuous(name = "Number of downloads",
labels = comma,
sec.axis = sec_axis( trans=~./1e6, name="Proportion of unique IPs")) +
scale_fill_brewer(palette = "Set1") +
theme_minimal()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment