Skip to content

Instantly share code, notes, and snippets.

@ramongallego
Created April 9, 2024 15:42
Show Gist options
  • Save ramongallego/a613b0a054fdb1e8e6a66178f53fca11 to your computer and use it in GitHub Desktop.
Save ramongallego/a613b0a054fdb1e8e6a66178f53fca11 to your computer and use it in GitHub Desktop.
kraken2 db from NCBI db
# First download to your computer the right database from NCBI. You can either go to https://ftp.ncbi.nlm.nih.gov/blast/db/
# and download the right db, or if it is a multi-file one run
# perl <path/to/ncbi/bin>/update_blastdb.pl --decompress <name_of_db>
# I would recommend having one db per folder, as it might rewrite the taxdb.btd/bti files
######
#
## Once you have the downloaded db, the next thing is to extract the FASTA from the dbs
# I am sure once there is a downloaded db, there is an easy way of generating the krakendb, but here we are
## USAGE: bash Format_NCBI_4_Kraken2.sh <folder_with_NCBI_db>/<NCBI_db_name> <KRAKEN_DB_NAME>
DBNAME=$2
input=$1
input_folder=$(dirname $input)
input_db=$(basename $input)
## STEP1 Extract required info from database
blastdbcmd -db $input -entry all -out "${input_folder}"/"${input_db}".txt -outfmt ">%a|kraken:taxid|%T,%s"
## STEP2 reformat as FASTA
awk -F',' '{print $1"\n"$2}' "${input_folder}"/"${input_db}".txt > "${input_folder}"/"${input_db}".fasta
## STEP3 download a taxonomy and start your custom db
kraken2-build --download-taxonomy --use-ftp --db $DBNAME
## STEP4 add the new seqs to the db
kraken2-build --add-to-library "${input_folder}"/"${input_db}".fasta --db $DBNAME
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment