Skip to content

Instantly share code, notes, and snippets.

View dwinter's full-sized avatar
🐢
I may be slow to respond.

David Winter dwinter

🐢
I may be slow to respond.
View GitHub Profile
@dwinter
dwinter / ex.r
Last active August 29, 2015 14:00
gb_acc <- "JF493405"
nuc_id <- entrez_search(db="nuccore", term=paste(gb_acc, "[accn]", sep=""))$id
tax_links <- entrez_link(db="taxonomy", dbfrom="nuccore", id=nuc_id)
tax_summ <- entrez_summary(db="taxonomy", id=tax_links$nuccore_taxonomy)
{
"metadata": {
"name": "",
"signature": "sha256:98f1acba9791082035860796ccf52294670282ba8deb7745219206b5182498d8"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
#!/usr/bin/env python
"""
Fetch all FASTQ files associated with an SRA ID
usage:
sra_fetch.py [sra-id]
"""
---
title: rentrez tutorial
layout: tutorial
packge_version: 0.2.1
---
```{r, eval=TRUE, echo=FALSE}
opts_chunk$set(fig.path="../assets/tutorial-images/rentrez/")
```
@dwinter
dwinter / ot.md
Created July 1, 2014 18:51
Open tree proposal

#An rOpenSci library for the Open Tree of Life API.

rOpenSci is a project that allows programatic access to data repostories in the popular R programming language. rOpenSci already provides libraries to query the phylogeny databases treeBASE and Phylomatic, as well as data resources provided by NCBI and dryad . A library wrapping the Open Tree of Life would be an excellent addition to the rOpenSci project and hopefully increase the availability of the Open Tree of Life data.

I imagine the first step in creating such a library would be to faithfully map

@dwinter
dwinter / NESCent_blog.md
Last active August 29, 2015 14:05
rOpenSci at NESCent Open Tree of Life Hackathon

#rOpenSci at NESCent Open Tree of Life Hackathon

The Open Tree of Life project aims to synthesize our combined knowledge of how organisms relate to each other, and make the results available to anyone who wants to use them. At present, the project contains data from more than 4 000 published phylogenies, which combine with other data sources to make a tree that covers 2.5 million species.

In September, the Open Tree of Life team are holding a hackathon to develop tools that use the project's web services to extract, annotate and add data. We are excited to say that Francois Michonneau and I will be attending the hackathon, where they plan to work with Joseph W. Brown on an R package that allows users to interact with the Open Tree data.

Joseph has already written a good deal of the code for this package, so a key goal for the

@dwinter
dwinter / worfflow.md
Created September 3, 2014 22:23
Workflow of read-alignment

For the most part the pipeline below follows the GATK's "best pratctice" advice for calling variants from short reads, which provides more documentation for the process.

For the Tetrahymena and Plasmodium projects these steps have been controlled by a set of shell scripts (see /home/david/malaria/scripts for the latest iteration) that automate various steps along the way. I usually run the scripts with all output redirected to a log file so anythign interesting is recorded:./script.sh &> logs/script.log

##Align reads (Bowtie 2)

Prior to alignment, you need to create an index of the reference genome:

$ bowtie2-build -f [path to ref] [index_file_stem]

#Half-time report from the tree-for-all hackathon

Apart from being a great deal for fun, the R projects at the OpenTreeOfLife hackathon have been making some good progress.

##Introducing rotl

Francois Michonneau, Jeremy Brown and I have been working on a package that wraps the OpenTree's various data APIs to allow users to search for trees, taxa and phylogenetic studies and pull down trees into their R sessions. Although we've started by focusing on low-level functions that wrap a single API call, there are all ready a few interesting functions (check out the repo's README for a couple of examples).

We've been working with Python and Ruby developers to generate complientary libraries, and hope to use the rest of the week to finish some convience functions that wrap up multiple calls to the OpenTree APIs to achiev

@dwinter
dwinter / licenses.md
Created September 30, 2014 18:10
rentrez and pmc licenses

#What's the easiest way to extract license info from PMC

As a test, play with a 50-paper request from PMC

library(rentrez)
search <- entrez_search(db="pmc", term="Tetrahymena", retmax=50)
x <- seq(0,5,0.05)
plot(x, dgamma(x, shape=10, scale=0.1), type='l')
lines(x, dgamma(x, shape=2, scale=0.5), col='red')
lines(x, dgamma(x, shape=1, scale=1), col='blue')