The aim of this post is to track the use of Open-Source software hosting facilities for disclosing DVCS repositories in life science research papers.
Our sample consists of the following four hosting services provided by Wikipedia
my.urls <- read.csv("dvcs_url.csv", header = TRUE, sep = ",")
print(my.urls)
## dvcs base.url
## 1 GitHub github.com
## 2 BitBucket bitbucket.org
## 3 GoogleCode code.google.com
## 4 LaunchPad launchpad.net
## 5 SourceForge sourceforge.net
We used Europe PMC as literature corpus. We searched for the url patterns within the publication subset for which Europe PMC holds the full text. For this, we used the rebi package, provided by rOpenSci.
require(rebi)
require(plyr)
my.urls$base.url <- as.character(my.urls$base.url)
my.data <- lapply(my.urls$base.url, search_publications, dataset = c("fulltext"))
names(my.data) <- my.urls$base.url
my.data <- ldply(my.data, rbind)
We have found
length(unique(my.data$id))
[1] 3190
publications referencing at least one Open-Source software hosting service.
Table 1 ranks the host services
id | PMC Publications found |
---|---|
sourceforge.net | 1495 |
code.google.com | 914 |
github.com | 832 |
bitbucket.org | 88 |
launchpad.net | 16 |
Figure 1 plots the yearly distribution of DVCS hosting sservcies over PubMed Central publications. Please note that data were gathered on
[1] "2014-05-23 16:12:24 CEST"
require(ggplot2)
my.data <- my.data[my.data$pubYear > 2008 & my.data$pubYear < 2014, ]
my.data$.id <- factor(my.data$.id, levels = c(rownames(data.frame(rev(sort(table(my.data$.id)))))))
my.df <- data.frame(as.matrix(table(unlist(my.data$pubYear), my.data$.id)))
ggplot(my.df, aes(Var1, Freq, group = Var2)) + geom_line(aes(colour = Var2,
show_guide = FALSE)) + geom_point() + theme_bw() + scale_colour_brewer("DVCS Host",
palette = 2, type = "qual") + xlab("Year") + ylab("PMC article disclosure") +
opts(legend.key = theme_rect(fill = "white", colour = "white"))
We have found that GitHub is gaining in importance for data and code disclosure in the life sciences compared to other DVCS hosting services.