taylorreiter/kraken_mircea.md

## kraken_mircea.md

      
    Raw
  

              kraken_mircea.md
            
          
    Kraken is broken something something NCBI numbers something something. Use perl scripts that supposedly dealt with the issue
(note that I was able to get the fungal one to work with the same loop etc, where the only difference was that only fungi was included)
http://www.opiniomics.org/building-a-kraken-database-with-new-ftp-structure-and-no-gi-numbers/
As of September 2016, someone commented that this method works, but something went wrong for me.
Ran on r4.8xlarge.
Get the sequences (note the script filters for complete genomes)
perl ~/Kraken_db_install_scripts/download_fungi.pl
perl ~/Kraken_db_install_scripts/download_bacteria.pl
perl ~/Kraken_db_install_scripts/download_archaea.pl
perl ~/Kraken_db_install_scripts/download_protozoa.pl
perl ~/Kraken_db_install_scripts/download_viral.pl

Build database step 1: Download taxonomy
kraken-build --download-taxonomy --db kraken_bvfpa_080416

Build database step 2: add to library
for dir in fungi protozoa archaea viral bacteria; do
        for fna in `ls $dir/*.fna`; do
                kraken-build --add-to-library $fna --db kraken_bvfpa_080416
        done
done

Build database step 3: make the kraken database
kraken-build --build --db kraken_bvfpa_080416

Try and run it
kraken --preload --db ~/Kraken_db_install_scripts/downloads/kraken_bvfpa_080416 --fastq-input SRR606249.pe.qc.fq.gz.abundtrim > kraken_bvpfa_SRR606249.pe.qc.fq.gz.abundtrim.out

Classified no sequences.
Try again with minikraken
tar -zxvf minikraken.tgz 
wget http://ccb.jhu.edu/software/kraken/dl/minikraken.tgz
kraken --preload --db ~/minikraken_20141208  --fastq-input SRR606249.pe.qc.fq.gz.abundtrim > minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out

Minikraken produced these results
Loading database... complete.
Processed 13080702 sequences (1310133482 bp) ...classify: malformed fastq file - quality header (@S)
13080775 sequences (1310.14 Mbp) processed in 398.106s (1971.4 Kseq/m, 197.46 Mbp/m).
  11317542 sequences classified (86.52%)
  1763233 sequences unclassified (13.48%)

Add labels to the kraken output
kraken-translate --db ~/minikraken_20141208 minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out > minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out_labels

Translate to tab
kraken-report --db ~/minikraken_20141208 minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out > minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out.tab

And in to mpa
kraken-mpa-report --db ~/minikraken_20141208 minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out > minikrakenSRR606249.pe.qc.fq.gz.abundtrim.out.mpa