Skip to content

Instantly share code, notes, and snippets.

View taylorreiter's full-sized avatar

Taylor Reiter taylorreiter

View GitHub Profile
@taylorreiter
taylorreiter / sourmash_plotting_twiddly_output
Last active March 15, 2017 19:35
Very slop code from sourmash RNA-seq testing via the Schurch 2016 yeast dataset.
# Code to visualize numpy output from sourmash compare
# NB output was saved to csv file from python using the command:
# sourmash compare -o yeast_syrah_L2 ./*sig # completed in terminal
# import numpy
# from sourmash_lib import fig
# import pylab
# D, labels = fig.load_matrix_and_labels('yeast_syrah_L2')
# D = (D + D.T) / 2.0
# numpy.savetxt("L2_numpy.csv", D, delimiter=",")
# Orrrrrr

Download Kaiju software

cd ~
git clone https://github.com/bioinformatics-centre/kaiju.git
cd kaiju/src
make

From kaiju github:

Downloaded the trimmed, abundance filtered reads for the mircea data set. Ran kaiju as indicated below, using databases built here (https://gist.github.com/taylorreiter/2511b0c6e904b455e7002742c5da1492).

kaijudb -e

~/kaiju/bin/kaiju -t ~/kaijudb_e/nodes.dmp -f ~/kaijudb_e/kaiju_db_nr_euk.fmi -i SRR606249.pe.qc.fq.gz.abundtrim -v -o kaijudb_e_SRR606249.pe.qc.fq.gz.abundtrim.out

Then added taxonomy to kaiju names

~/kaiju/bin/addTaxonNames -t ~/kaijudb_e/nodes.dmp -n ~/kaijudb_e/names.dmp -i kaijudb_e_SRR606249.pe.qc.fq.gz.abundtrim.out -o kaijudb_e_SRR606249.pe.qc.fq.gz.abundtrim.out.names
@taylorreiter
taylorreiter / kraken_mircea.md
Last active April 21, 2017 16:04
kraken_mircea.md

Kraken is broken something something NCBI numbers something something. Use perl scripts that supposedly dealt with the issue (note that I was able to get the fungal one to work with the same loop etc, where the only difference was that only fungi was included)

http://www.opiniomics.org/building-a-kraken-database-with-new-ftp-structure-and-no-gi-numbers/

As of September 2016, someone commented that this method works, but something went wrong for me.

Ran on r4.8xlarge.

Get the sequences (note the script filters for complete genomes)

@taylorreiter
taylorreiter / blog-setup.md
Last active April 25, 2017 03:37
Setup instructions for blog

Make a new volume from the EC2 GUI.

Format the new volume

sudo mkfs -t ext4 /dev/xvdg

Mount the new volume

sudo mount /dev/xvdg /mnt2/
@taylorreiter
taylorreiter / new-repo-from-prev.md
Created September 1, 2017 20:38
code used to create new repo from previous

An example of how to create a new git repository from a previous git repository

git clone https://github.com/taylorreiter/IGI_preproposal.git
cd IGI_preproposal/computational_pilot
git filter-branch --subdirectory-filter computational_pilot -- --all
# renamed folder cultivar_seq
git init
git add .
git commit -m "First commit"
git remote add origin https://github.com/taylorreiter/cultivar_seq.git
@taylorreiter
taylorreiter / download_ncbi_fungi.md
Created October 20, 2017 20:22
Download NCBI fungal genomes from refseq and genbank

Create a virtual environment and install ncbi-genome-download

cd ~
python2.7 -m virtualenv ncbi-genome-downloadEnv
source ncbi-genome-downloadEnv/bin/activate
cd ncbi-genome-downloadEnv
git clone https://github.com/kblin/ncbi-genome-download.git
cd ncbi-genome-download
pip install .
git checkout -b tr_sourmash_2019 origin/tr_sourmash_2019 --track

Sum over a column in a csv:

cat file.csv| cut -d "," -f 2 | paste -sd+ - | bc

Pretty view a csv in less:

cat file.csv | column -t -s, | less -S