Skip to content

Instantly share code, notes, and snippets.

@taylorreiter
Last active April 21, 2017 16:04
Show Gist options
  • Save taylorreiter/30b8bbb07ee6339cbd4baa8758c09844 to your computer and use it in GitHub Desktop.
Save taylorreiter/30b8bbb07ee6339cbd4baa8758c09844 to your computer and use it in GitHub Desktop.
kraken_mircea.md

Kraken is broken something something NCBI numbers something something. Use perl scripts that supposedly dealt with the issue (note that I was able to get the fungal one to work with the same loop etc, where the only difference was that only fungi was included)

http://www.opiniomics.org/building-a-kraken-database-with-new-ftp-structure-and-no-gi-numbers/

As of September 2016, someone commented that this method works, but something went wrong for me.

Ran on r4.8xlarge.

Get the sequences (note the script filters for complete genomes)

perl ~/Kraken_db_install_scripts/download_fungi.pl
perl ~/Kraken_db_install_scripts/download_bacteria.pl
perl ~/Kraken_db_install_scripts/download_archaea.pl
perl ~/Kraken_db_install_scripts/download_protozoa.pl
perl ~/Kraken_db_install_scripts/download_viral.pl

Build database step 1: Download taxonomy

kraken-build --download-taxonomy --db kraken_bvfpa_080416

Build database step 2: add to library

for dir in fungi protozoa archaea viral bacteria; do
        for fna in `ls $dir/*.fna`; do
                kraken-build --add-to-library $fna --db kraken_bvfpa_080416
        done
done

Build database step 3: make the kraken database

kraken-build --build --db kraken_bvfpa_080416

Try and run it

kraken --preload --db ~/Kraken_db_install_scripts/downloads/kraken_bvfpa_080416 --fastq-input SRR606249.pe.qc.fq.gz.abundtrim > kraken_bvpfa_SRR606249.pe.qc.fq.gz.abundtrim.out

Classified no sequences.

Try again with minikraken

tar -zxvf minikraken.tgz 
wget http://ccb.jhu.edu/software/kraken/dl/minikraken.tgz
kraken --preload --db ~/minikraken_20141208  --fastq-input SRR606249.pe.qc.fq.gz.abundtrim > minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out

Minikraken produced these results

Loading database... complete.
Processed 13080702 sequences (1310133482 bp) ...classify: malformed fastq file - quality header (@S)
13080775 sequences (1310.14 Mbp) processed in 398.106s (1971.4 Kseq/m, 197.46 Mbp/m).
  11317542 sequences classified (86.52%)
  1763233 sequences unclassified (13.48%)

Add labels to the kraken output

kraken-translate --db ~/minikraken_20141208 minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out > minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out_labels

Translate to tab

kraken-report --db ~/minikraken_20141208 minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out > minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out.tab

And in to mpa

kraken-mpa-report --db ~/minikraken_20141208 minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out > minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out.mpa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment