Skip to content

Instantly share code, notes, and snippets.

View taylorreiter's full-sized avatar

Taylor Reiter taylorreiter

View GitHub Profile
@taylorreiter
taylorreiter / download_ncbi_fungi.md
Created October 20, 2017 20:22
Download NCBI fungal genomes from refseq and genbank

Create a virtual environment and install ncbi-genome-download

cd ~
python2.7 -m virtualenv ncbi-genome-downloadEnv
source ncbi-genome-downloadEnv/bin/activate
cd ncbi-genome-downloadEnv
git clone https://github.com/kblin/ncbi-genome-download.git
cd ncbi-genome-download
pip install .
@taylorreiter
taylorreiter / new-repo-from-prev.md
Created September 1, 2017 20:38
code used to create new repo from previous

An example of how to create a new git repository from a previous git repository

git clone https://github.com/taylorreiter/IGI_preproposal.git
cd IGI_preproposal/computational_pilot
git filter-branch --subdirectory-filter computational_pilot -- --all
# renamed folder cultivar_seq
git init
git add .
git commit -m "First commit"
git remote add origin https://github.com/taylorreiter/cultivar_seq.git

Make a new volume from the EC2 GUI.

Format the new volume

sudo mkfs -t ext4 /dev/xvdg

Mount the new volume

sudo mount /dev/xvdg /mnt2/
@taylorreiter
taylorreiter / blog-setup.md
Last active April 25, 2017 03:37
Setup instructions for blog
@taylorreiter
taylorreiter / kraken_mircea.md
Last active April 21, 2017 16:04
kraken_mircea.md

Kraken is broken something something NCBI numbers something something. Use perl scripts that supposedly dealt with the issue (note that I was able to get the fungal one to work with the same loop etc, where the only difference was that only fungi was included)

http://www.opiniomics.org/building-a-kraken-database-with-new-ftp-structure-and-no-gi-numbers/

As of September 2016, someone commented that this method works, but something went wrong for me.

Ran on r4.8xlarge.

Get the sequences (note the script filters for complete genomes)

Downloaded the trimmed, abundance filtered reads for the mircea data set. Ran kaiju as indicated below, using databases built here (https://gist.github.com/taylorreiter/2511b0c6e904b455e7002742c5da1492).

kaijudb -e

~/kaiju/bin/kaiju -t ~/kaijudb_e/nodes.dmp -f ~/kaijudb_e/kaiju_db_nr_euk.fmi -i SRR606249.pe.qc.fq.gz.abundtrim -v -o kaijudb_e_SRR606249.pe.qc.fq.gz.abundtrim.out

Then added taxonomy to kaiju names

~/kaiju/bin/addTaxonNames -t ~/kaijudb_e/nodes.dmp -n ~/kaijudb_e/names.dmp -i kaijudb_e_SRR606249.pe.qc.fq.gz.abundtrim.out -o kaijudb_e_SRR606249.pe.qc.fq.gz.abundtrim.out.names

Download Kaiju software

cd ~
git clone https://github.com/bioinformatics-centre/kaiju.git
cd kaiju/src
make

From kaiju github:

@taylorreiter
taylorreiter / sourmash_plotting_twiddly_output
Last active March 15, 2017 19:35
Very slop code from sourmash RNA-seq testing via the Schurch 2016 yeast dataset.
# Code to visualize numpy output from sourmash compare
# NB output was saved to csv file from python using the command:
# sourmash compare -o yeast_syrah_L2 ./*sig # completed in terminal
# import numpy
# from sourmash_lib import fig
# import pylab
# D, labels = fig.load_matrix_and_labels('yeast_syrah_L2')
# D = (D + D.T) / 2.0
# numpy.savetxt("L2_numpy.csv", D, delimiter=",")
# Orrrrrr