Skip to content

Instantly share code, notes, and snippets.

@DASpringate
DASpringate / web_mining_example.R
Last active August 29, 2015 14:00
Web mining example: Scraping web data to look for a new job.
require(XML)
require(RCurl)
require(stringr)
require(rentrez)
require(rjson)
require(reshape2)
require(ggmap)
require(mapproj)
require(devtools)
@DASpringate
DASpringate / cond.R
Last active May 19, 2017 15:24
A simple R implementation of `cond` from Lisp. Allows for arbitrary numbers of conditionals without nested if statements
#' R implementation of `cond` from Lisp
#' allows for arbitrary numbers of conditionals without ugly nested if statements
#' conditionals are entered as pairs of expressions (clauses),
#' first the expression to be evaluated and second the return value if the expression is true
#' @param ... an even number of expressions as pairs (see example)
#' @param true the `else` expression (Taken from the Lisp (T resultN) see http://www.cis.upenn.edu/~matuszek/LispText/lisp-cond.html)
#' @return The paired value of the first true conditional expression or the value of true
#' @examples
#' x <- runif(1)
#' cond(x < 0.2, "lower tail",
@DASpringate
DASpringate / simulated_insect_data
Last active January 11, 2017 07:47
R code to produce a simulated dataset for an experiment on a made up insect. Measures include sex, body length, thorax width, number of thoracic bristles and some measure of aggression behaviour. Also there is exposure to some treatment stimulus/drug. This simulation uses Copulas to generate correlated variables from binomial, Gaussian and Poiss…
# R code to produce a simulated dataset for an experiment on a made up insect.
# Measures include sex, body length, thorax width, number of thoracic bristles and some measure of aggression behaviour.
# Also there is exposure to some treatment stimulus/drug.
# This simulation uses Copulas to generate correlated variables from binomial, Gaussian and Poisson distributions
require(copula)
set.seed(1888)
n <- 1000
@DASpringate
DASpringate / Samatha_example_blog_setup
Created September 6, 2013 14:07
A quick setup of an example blog using the Samatha static site engine
require(devtools)
load_all(".")
site <- "/home/me/mysite"
skeleton(site)
setup_example_site(site)
## buggy - Have to run twice to get the tags to work...
samatha(site, rss = FALSE, initial = TRUE)
samatha(site, rss = FALSE, initial = TRUE)
@DASpringate
DASpringate / fast_rbind
Last active December 19, 2015 15:28
A faster, multicore rbind function. Still way slower than rbindlist but has all of the checks in rbind and returns a data frame, not data.table
fast_rbind <- function(to_bind, cores = 12, splits = 10){
# to_bind : list of dataframes to bind
require(multicore)
myseq <- c(seq(from = 1,to = length(dat2),
by = floor(length(dat2) / max(cores, splits))),
length(dat2))
intermediates <- mclapply(1:length(myseq),
function(x) {
if(x != length(myseq)){
do.call(`rbind`, dat2[myseq[x]:(myseq[x+1] -1)])
@DASpringate
DASpringate / get_GOLD_metadata.R
Created June 14, 2012 09:52
Extracts metadata from scraped GOLD genome database files into a single flatfile
#!/usr/bin/Rscript
# Extracts a variety of organism, environmental, sequencing and project metadata
# From saved GOLDstamp files (genomesonline.org)
# and saves as a flatfile with 1 taxa per line
# See gist.github.com/2929217 to scrape the files from a treebase taxa list
require(XML)
url <- "http://www.treebase.org/treebase-web/search/study/taxa.html?id=10965"
@DASpringate
DASpringate / treebase_metadata.py
Created June 14, 2012 09:10
Extracts organism metadata for taxa in treebase repo from GenomesOnline via ncbi
#!/usr/bin/python2
# Quick and dirty web scraper for extracting html files of
# Organism information, Genome Project Information,
# Sequencing Information, Environmental Metadata
# and organism metadata
# from GOLD databases based on treebase taxa lists, via ncbi
# Saves pages in the current directory
from urllib2 import urlopen