Skip to content

Instantly share code, notes, and snippets.

@patperry
Last active September 20, 2017 22:06
Show Gist options
  • Save patperry/8272e69ff79fe21cd3b26c411eee68c1 to your computer and use it in GitHub Desktop.
Save patperry/8272e69ff79fe21cd3b26c411eee68c1 to your computer and use it in GitHub Desktop.
Strange CRAN download behavior
# I released version 0.9.0 of corpus on 2017-08-19
# (http://corpustext.com/ )
# Here are the download logs from the day after
tmp <- tempfile(fileext = ".csv.gz")
download.file("http://cran-logs.rstudio.com/2017/2017-08-20.csv.gz", tmp)
data <- read.csv(tmp, stringsAsFactors = FALSE)
# I'm going to look at "corpus" downloads from Great Britain
corpus_gb <- subset(data, package == "corpus" & country == "GB")
# Here are downloaded version numbers and the IP addresses of the downloaders
print(table(corpus_gb[, c("version", "ip_id")]))
# ip_id
# version 158 183 494 677 701 10136 18653
# 0.2-0 12 15 18 7 17 0 0
# 0.3.0 12 15 18 7 17 0 0
# 0.3.1 12 15 18 7 17 0 0
# 0.4.0 12 15 18 7 17 0 0
# 0.5.0 12 15 18 7 17 0 0
# 0.5.1 12 15 18 7 17 0 0
# 0.6.0 12 15 18 7 17 0 0
# 0.7.0 12 15 18 7 17 0 0
# 0.8.0 24 30 36 14 34 0 1
# 0.9.0 24 30 36 14 34 0 0
# 0.9.1 2 4 2 0 2 1 0
# WTF? 5 IP addresses account for 838 downloads. I see similar behavior for every release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment