sujaikumar/2015-11-30-blastp-bug.md

## 2015-11-30-blastp-bug.md

      
    Raw
  

              2015-11-30-blastp-bug.md
            
          
    NCBI blastp seems to have a bug where it reports different top hits when -max_target_seqs is changed.
This is a serious problem because the first 20 hits (for example) should be the same whether
-max_target_seqs 100 or -max_target_seqs 500 is used.
The bug is reproducible on the command line when searching NCBI's nr blast database
(dated 25-Nov-2015) using NCBI 2.2.28+, 2.2.30+ and 2.2.31+.
At first I thought it was something to do with my local exe/blastdb, but the same problem
is also apparent on the NCBI blastp web interface (as of 30-Nov-2015)
To test online, go to http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins
Enter the following FASTA sequence in the query text box:
>nHd.2.3.1.t00019-RA
MSNLGITDPCVDAMNSLGLKLEELQDLEVDAGLGNGGLGRLAACFMDSLATLSIPAIGYGIRYEFGIFNQRVINGEQVEE
RDDWLEFGDPWEKLRQDKKISVYFNGKTYVDKEGRSHWVDTQQIVD

Database nr (default)
Leave other fields blank
Check that the algorithm parameters say:
Max target sequences 100 (default)
Expect threshold: 1e-5 (instead of default 100)
Leave rest of parameters as default.
Click BLAST:

The first 20 hits are all eukaryotic (top hit Trichuris trichura 8e-36). No bacterial hits in results at all.

If you now change the Max target sequences to 500, and rerun, you see:

The top hit is Bacteria (Burkholderia kururiensis 2e-40)

This is reproducible on the command line (using the versions of blastp mentioned above):
blastp -query input.fasta -db nr -outfmt 6 -max_target_seqs 100 -evalue 1e-5 >out.1e-5.max100.txt
blastp -query input.fasta -db nr -outfmt 6 -max_target_seqs 500 -evalue 1e-5 >out.1e-5.max500.txt

Can someone else confirm that they have seen this bug? It's possible I am doing something silly, but
if not, then this is a serious bug, because max_target_seqs is only supposed to change the number of
matches returned, not the TOP hits. See http://www.ncbi.nlm.nih.gov/books/NBK279682/
Screenshots attached:

max100.png - https://www.dropbox.com/s/sbfaviez7hon4it/max100.png?dl=0
max500.png - https://www.dropbox.com/s/8z1kso2d53k3l5k/max500.png?dl=0