Skip to content

Instantly share code, notes, and snippets.

@macmanes
Created June 17, 2015 16:22
Show Gist options
  • Save macmanes/f24be2ba6f5b839631de to your computer and use it in GitHub Desktop.
Save macmanes/f24be2ba6f5b839631de to your computer and use it in GitHub Desktop.
BUSCO error

I am having an issue with running the sample data. I can run with the -l example flag and these results match the expected results, but when I run with -l eukaryota, I get an error, which is detailed below.

python3 ../BUSCO_v1.1b1.py -o euk_test -in target.fa -l eukaryota -m genome -c 10
eukaryota
*** Running tBlastN ***


Building a new DB, current time: 06/17/2015 12:07:21
New DB name:   euk_test
New DB title:  target.fa
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 1 sequences in 0.0191998 seconds.
*** Getting coordinates for candidate regions! ***
*** pre-Augustus scaffold extraction ***
*** Running Augustus prediction ***
Starting Thread-1
Starting Thread-2
Starting Thread-3
Starting Thread-4
Starting Thread-5
Starting Thread-6
Starting Thread-7
Starting Thread-8
Starting Thread-9
Starting Thread-10
*** Error in `augustus': double free or corruption (fasttop): 0x0000000002e96c50 ***
Exiting Thread-9
Exiting Thread-6
Exiting Thread-7
Exiting Thread-8
Exiting Thread-1
Exiting Thread-2
Exiting Thread-10
Exiting Thread-4
Exiting Thread-5
Exiting Thread-3
Exiting Main Thread
*** Extracting predicted proteins ***
*** Running HMMER to confirm orthology of predicted proteins ***

Error: Sequence file ./run_euk_test//augustus_proteins/47983.fas is empty or misformatted


Error: Sequence file ./run_euk_test//augustus_proteins/87390.fas is empty or misformatted


Error: Sequence file ./run_euk_test//augustus_proteins/87828.fas is empty or misformatted

...

*** Parsing HMMER results ***
Total complete BUSCOs found in assembly (<2 sigma) :  3	(0 duplicated).
Total BUSCOs partially recovered (>2 sigma) :  0
Total groups searched: 429
Total BUSCOs not found:  426
Training augustus gene predictor
Will create parameters for a EUKARYOTIC species!
creating directory /share/augustus-3.1/config/species/euk_test/ ...
creating /share/augustus-3.1/config/species/euk_test/euk_test_parameters.cfg ...
creating /share/augustus-3.1/config/species/euk_test/euk_test_weightmatrix.txt ...
creating /share/augustus-3.1/config/species/euk_test/euk_test_metapars.cfg ...
The necessary files for training euk_test have been created.
Now, either run etraining or optimize_parameters.pl with --species=euk_test.
etraining quickly estimates the parameters from a file with training genes.
optimize_augustus.pl alternates running etraining and augustus to find optimal metaparameters.

# Read in 2 genbank sequences.
Quantiles of the GC contents in the training set:
0%	0.433
5%	0.433	10%	0.433
15%	0.433	20%	0.433
25%	0.433	30%	0.433
35%	0.433	40%	0.433
45%	0.433	50%	0.433
55%	0.433	60%	0.433
65%	0.433	70%	0.433
75%	0.433	80%	0.433
85%	0.433	90%	0.433
95%	0.433	100%	0.44
HMM-training the parameters...
i= 0 bc= (0.237, 0.263, 0.263, 0.237)
 ** building model for exons *EXON*
start codon frequencies: ATG(2)
# admissible start codons and their probabilities: ATG(1), CTG(0), TTG(0)
 number of bases in the reading frames: 696 698 698
--- frame = 0 ---    minPatSum = 233
--- frame = 1 ---    minPatSum = 233
--- frame = 2 ---    minPatSum = 233
--- initial frame = 0 ---    minPatSum = 233
--- initial frame = 1 ---    minPatSum = 233
--- initial frame = 2 ---    minPatSum = 233
--- internal exon terminal frame = 0 ---    minPatSum = 233
--- internal exon terminal frame = 1 ---    minPatSum = 233
--- internal exon terminal frame = 2 ---    minPatSum = 233
single, initial, internal, terminal mean exon lengths :
765	189	142	234
single exon : 1
initial exon 0 : 1
initial exon 1 : 0
initial exon 2 : 0
internal exon 0 : 6
internal exon 1 : 0
internal exon 2 : 1
terminal exon : 1
Frequency of stop codons:
tag:    0 (0)
taa:    1 (0.5)
tga:    1 (0.5)
end *EXON*
Storing parameters to file...
Writing exon model parameters [1] to file /share/augustus-3.1/config/species/euk_test/euk_test_exon_probs.pbl.
*** Re-running failed predictions with different constraints, total number 426 ***
Starting to run Augustus again....
Starting Thread-1
Starting Thread-2
Starting Thread-3
Starting Thread-4
Starting Thread-5
Starting Thread-6
Starting Thread-7
Starting Thread-8
Starting Thread-9
Starting Thread-10

*** Error in `augustus': double free or corruption (fasttop): 0x0000000001a973a0 ***
Exiting Thread-10
Exiting Thread-4
Exiting Thread-5
Exiting Thread-7
Exiting Thread-1
Exiting Thread-6
Exiting Thread-3
Exiting Thread-8
Exiting Thread-2
Exiting Thread-9
Exiting Main Thread
Starting to run SED....
Starting Thread-1
Starting Thread-2
Starting Thread-3
Starting Thread-4
Starting Thread-5
Starting Thread-6
Starting Thread-7
Starting Thread-8
Starting Thread-9
Starting Thread-10
Exiting Thread-1
Exiting Thread-4
Exiting Thread-2
Exiting Thread-8
Exiting Thread-6
Exiting Thread-3
Exiting Thread-7
Exiting Thread-9
Exiting Thread-5
Exiting Thread-10
Exiting Main Thread
Starting to run EXTRACT....
Starting to run HMMER....
Starting Thread-1
Starting Thread-2
Starting Thread-3
Starting Thread-4
Starting Thread-5
Starting Thread-6
Starting Thread-7
Starting Thread-8
Starting Thread-9
Starting Thread-10

Error: Sequence file ./run_euk_test//augustus_proteins/27931.fas is empty or misformatted


Error: Sequence file ./run_euk_test//augustus_proteins/37784.fas is empty or misformatted

...

Exiting Thread-9
Exiting Thread-10
Exiting Thread-2
Exiting Thread-4
Exiting Thread-1
Exiting Thread-6
Exiting Thread-3
Exiting Thread-8
Exiting Thread-5
Exiting Thread-7
Exiting Main Thread
Total running time:   390.84869170188904 seconds
Total complete BUSCOs found in assembly (<2 sigma) :  3	(0 duplicated).
Total BUSCOs partially recovered (>2 sigma) :  0
Total groups searched: 429
Total BUSCOs not found:  426

Indeed, when I look at these .fas files that are reported empty they are indeed empty.

l ./run_euk_test//augustus_proteins/37784.fas
-rw-r--r-- 1 macmanes macmanes 0 Jun 17 12:13 ./run_euk_test//augustus_proteins/37784.fas

Just confused as why this would run with the sample lineage but not with the eukaryote lineage..

Please advise.

@JustinPeyton
Copy link

I got a similar error. Does anyone have a work around?

I did notice that in my log for the sample data is says, "Number of training sequences is too small." It may mean that the author needs to include a non-trivial data set in order to test all of the functions.

@JustinPeyton
Copy link

I wrote to one of the authors or the program, Felipe Simao, and asked him about the error. My question and his response follows.

Do you have any idea what is causing it? Is it likely to be a formatting issue?

The "empty file" error is not a 'true' error, in that it arrises from some debugging information being retained into the final code. This occurs when Augustus outputs no predictions for a particular gene, and thus results in an empty file which HMMer reports as an error. The reporting of this error does not influence the results of the BUSCO benchmarks, there were no predictions in the empty files to begin with.

@marcelauliano
Copy link

I wonder if you guys have any new feedback on this?

My situation is similar to macmanes: I'm running using eukaryote and get a lot of empty .fas files, most of BUSCOs not found. Which I understand it could be that no predictions was found for these files whatsoever, but it seems TOO little predictions to me... I have a pretty decent genome draft... I wonder if I'm doing something wrong?

The BUSCO example runs fine...
Any thoughts?

Thanks guys!

@jeanlain
Copy link

jeanlain commented Feb 4, 2016

We're having the same issue with the arthropoda dataset on Drosophila melanogaster reference genome ("genome" mode). Many empty files, including those finishing by .1 to .3.

EDIT: the empty files are not so numerous, upon inspection. This may be normal.

@kbrevs
Copy link

kbrevs commented Feb 4, 2016

Hey all,

We are having the same issue, but it appears to be all of the files - it even happens when running the example files, so I don't think it is actual missing predictions. I also went into the tblastn results, and 2667 of the 2675 "BUSCOs" have matches. I'm not sure how post-processing winnows that down, but it seems like there really is a programmatic issue. Does anyone know why those files are remaining empty/misformatted? Thanks!

@mtollis
Copy link

mtollis commented Mar 6, 2016

Same issue, any eventual solution? Lots of tblastn hits, but empty augustus results.

@hovdebt
Copy link

hovdebt commented Mar 15, 2016

kbrevs, I was having the same issue with ALL augustus proteins seqeunce files being empty, but I realized I was running Augustus 3.2.X not the specified Augustus 3.0.X. Changing to the right version of Augustus fixed the issue for me. This is not the same issue as the opening post though.

@mtollis
Copy link

mtollis commented May 6, 2016

Yes, this fixed the issue for me as well. Fortunately for future users, the new version of BUSCO eliminates the dependency on Augustus 3.0.x (among other things).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment