Skip to content

Instantly share code, notes, and snippets.

@benmarwick
Last active September 27, 2022 20:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save benmarwick/951f189074955b3f1cba0a5128b26b49 to your computer and use it in GitHub Desktop.
Save benmarwick/951f189074955b3f1cba0a5128b26b49 to your computer and use it in GitHub Desktop.
How many archaeology papers with R as of Sept 2021?
# How many articles on the list in Sept 2021?
# First, run some lines from archaepaperswithcode.R to create repo,
# then:
## Coerce commits to a data.frame
df <- as.data.frame(repo)
# filter rows of commits from Sept 2021
the_sha <- df[str_detect(df$when, "2021-09"), ]$sha[1]
# exctract the last commit of that month
that_commit <- lookup(repo, the_sha)
# get contents of that commit, specifically the README.md file
content_at_that_commit <- content(tree(that_commit)["README.md"])
# analyse text to find our how many articles are in that file
content_at_that_commit -> archy_ctv_readme
# these lines come from archaepaperswithcode.R
archy_ctv_readme_start <- str_which(archy_ctv_readme,
" Publications that include R code")
archy_ctv_readme <-
archy_ctv_readme[archy_ctv_readme_start:(length(archy_ctv_readme) - 3)]
# get all dates of publication
archy_ctv_readme <- str_remove_all(archy_ctv_readme, "[[:punct:]]")
archy_ctv_readme_20XX <- str_extract(archy_ctv_readme, " 20[[:digit:]]{2} ")
archy_ctv_readme_20XX <- str_squish(unlist(archy_ctv_readme_20XX))
archy_ctv_readme_20XX <- as.numeric(archy_ctv_readme_20XX)
archy_ctv_readme_20XX <- archy_ctv_readme_20XX[!is.na(archy_ctv_readme_20XX)]
number_of_reproducible_articles <- length(archy_ctv_readme_20XX)
number_of_reproducible_articles
# 204 at the end of Sept 2021
# if we just want to know how many papers from Jan 2021 to Sept 2021, here's how to get that:
# filter rows of commits from Jan 2021 to Sept 2021
time_span <- paste0("2021", "-0", 1:9)
time_span_df <-
df %>%
filter(str_detect(when, paste0(time_span, collapse = "|")))
nrow(time_span_df)
# 81
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment