I am having an issue with running the sample data. I can run with the -l example
flag and these results match the expected results, but when I run with -l eukaryota
, I get an error, which is detailed below.
python3 ../BUSCO_v1.1b1.py -o euk_test -in target.fa -l eukaryota -m genome -c 10
eukaryota
*** Running tBlastN ***
Building a new DB, current time: 06/17/2015 12:07:21
New DB name: euk_test
New DB title: target.fa
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 1 sequences in 0.0191998 seconds.
*** Getting coordinates for candidate regions! ***
*** pre-Augustus scaffold extraction ***
*** Running Augustus prediction ***
Starting Thread-1
Starting Thread-2
Starting Thread-3
Starting Thread-4
Starting Thread-5
Starting Thread-6
Starting Thread-7
Starting Thread-8
Starting Thread-9
Starting Thread-10
*** Error in `augustus': double free or corruption (fasttop): 0x0000000002e96c50 ***
Exiting Thread-9
Exiting Thread-6
Exiting Thread-7
Exiting Thread-8
Exiting Thread-1
Exiting Thread-2
Exiting Thread-10
Exiting Thread-4
Exiting Thread-5
Exiting Thread-3
Exiting Main Thread
*** Extracting predicted proteins ***
*** Running HMMER to confirm orthology of predicted proteins ***
Error: Sequence file ./run_euk_test//augustus_proteins/47983.fas is empty or misformatted
Error: Sequence file ./run_euk_test//augustus_proteins/87390.fas is empty or misformatted
Error: Sequence file ./run_euk_test//augustus_proteins/87828.fas is empty or misformatted
...
*** Parsing HMMER results ***
Total complete BUSCOs found in assembly (<2 sigma) : 3 (0 duplicated).
Total BUSCOs partially recovered (>2 sigma) : 0
Total groups searched: 429
Total BUSCOs not found: 426
Training augustus gene predictor
Will create parameters for a EUKARYOTIC species!
creating directory /share/augustus-3.1/config/species/euk_test/ ...
creating /share/augustus-3.1/config/species/euk_test/euk_test_parameters.cfg ...
creating /share/augustus-3.1/config/species/euk_test/euk_test_weightmatrix.txt ...
creating /share/augustus-3.1/config/species/euk_test/euk_test_metapars.cfg ...
The necessary files for training euk_test have been created.
Now, either run etraining or optimize_parameters.pl with --species=euk_test.
etraining quickly estimates the parameters from a file with training genes.
optimize_augustus.pl alternates running etraining and augustus to find optimal metaparameters.
# Read in 2 genbank sequences.
Quantiles of the GC contents in the training set:
0% 0.433
5% 0.433 10% 0.433
15% 0.433 20% 0.433
25% 0.433 30% 0.433
35% 0.433 40% 0.433
45% 0.433 50% 0.433
55% 0.433 60% 0.433
65% 0.433 70% 0.433
75% 0.433 80% 0.433
85% 0.433 90% 0.433
95% 0.433 100% 0.44
HMM-training the parameters...
i= 0 bc= (0.237, 0.263, 0.263, 0.237)
** building model for exons *EXON*
start codon frequencies: ATG(2)
# admissible start codons and their probabilities: ATG(1), CTG(0), TTG(0)
number of bases in the reading frames: 696 698 698
--- frame = 0 --- minPatSum = 233
--- frame = 1 --- minPatSum = 233
--- frame = 2 --- minPatSum = 233
--- initial frame = 0 --- minPatSum = 233
--- initial frame = 1 --- minPatSum = 233
--- initial frame = 2 --- minPatSum = 233
--- internal exon terminal frame = 0 --- minPatSum = 233
--- internal exon terminal frame = 1 --- minPatSum = 233
--- internal exon terminal frame = 2 --- minPatSum = 233
single, initial, internal, terminal mean exon lengths :
765 189 142 234
single exon : 1
initial exon 0 : 1
initial exon 1 : 0
initial exon 2 : 0
internal exon 0 : 6
internal exon 1 : 0
internal exon 2 : 1
terminal exon : 1
Frequency of stop codons:
tag: 0 (0)
taa: 1 (0.5)
tga: 1 (0.5)
end *EXON*
Storing parameters to file...
Writing exon model parameters [1] to file /share/augustus-3.1/config/species/euk_test/euk_test_exon_probs.pbl.
*** Re-running failed predictions with different constraints, total number 426 ***
Starting to run Augustus again....
Starting Thread-1
Starting Thread-2
Starting Thread-3
Starting Thread-4
Starting Thread-5
Starting Thread-6
Starting Thread-7
Starting Thread-8
Starting Thread-9
Starting Thread-10
*** Error in `augustus': double free or corruption (fasttop): 0x0000000001a973a0 ***
Exiting Thread-10
Exiting Thread-4
Exiting Thread-5
Exiting Thread-7
Exiting Thread-1
Exiting Thread-6
Exiting Thread-3
Exiting Thread-8
Exiting Thread-2
Exiting Thread-9
Exiting Main Thread
Starting to run SED....
Starting Thread-1
Starting Thread-2
Starting Thread-3
Starting Thread-4
Starting Thread-5
Starting Thread-6
Starting Thread-7
Starting Thread-8
Starting Thread-9
Starting Thread-10
Exiting Thread-1
Exiting Thread-4
Exiting Thread-2
Exiting Thread-8
Exiting Thread-6
Exiting Thread-3
Exiting Thread-7
Exiting Thread-9
Exiting Thread-5
Exiting Thread-10
Exiting Main Thread
Starting to run EXTRACT....
Starting to run HMMER....
Starting Thread-1
Starting Thread-2
Starting Thread-3
Starting Thread-4
Starting Thread-5
Starting Thread-6
Starting Thread-7
Starting Thread-8
Starting Thread-9
Starting Thread-10
Error: Sequence file ./run_euk_test//augustus_proteins/27931.fas is empty or misformatted
Error: Sequence file ./run_euk_test//augustus_proteins/37784.fas is empty or misformatted
...
Exiting Thread-9
Exiting Thread-10
Exiting Thread-2
Exiting Thread-4
Exiting Thread-1
Exiting Thread-6
Exiting Thread-3
Exiting Thread-8
Exiting Thread-5
Exiting Thread-7
Exiting Main Thread
Total running time: 390.84869170188904 seconds
Total complete BUSCOs found in assembly (<2 sigma) : 3 (0 duplicated).
Total BUSCOs partially recovered (>2 sigma) : 0
Total groups searched: 429
Total BUSCOs not found: 426
Indeed, when I look at these .fas
files that are reported empty they are indeed empty.
l ./run_euk_test//augustus_proteins/37784.fas
-rw-r--r-- 1 macmanes macmanes 0 Jun 17 12:13 ./run_euk_test//augustus_proteins/37784.fas
Just confused as why this would run with the sample
lineage but not with the eukaryote
lineage..
Please advise.
I got a similar error. Does anyone have a work around?
I did notice that in my log for the sample data is says, "Number of training sequences is too small." It may mean that the author needs to include a non-trivial data set in order to test all of the functions.