Skip to content

Instantly share code, notes, and snippets.

@macmanes
Last active June 1, 2022 19:39
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save macmanes/bc5598e0d079ff6d71f2 to your computer and use it in GitHub Desktop.
Save macmanes/bc5598e0d079ff6d71f2 to your computer and use it in GitHub Desktop.
GeneMark/Braker issue

Having an issue running GeneMark, within Braker.

Versions

perl /share/braker.pl --version
braker.pl version 1.6

perl /share/gm_et_linux_64/gmes_petap/gmes_petap.pl
# -------------------
Usage:  /share/gm_et_linux_64/gmes_petap/gmes_petap.pl  [options]  --sequence [filename]

GeneMark-ES Suite version 4.30
   includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction

Running BRAKER:

perl /share/braker.pl \
--genome ../genome/Mya.genome.v1.1.1.fasta \
--bam ../genome/transcriptome.clam.v.1.11.bam \
--BAMTOOLS_PATH=/share/bamtools/bin/ \
--UTR on --cores 10 --species=Mya_a
perl /share/braker.pl --genome ../genome/Mya.genome.v1.1.1.fasta --bam ../genome/transcriptome.clam.v.1.11.bam --BAMTOOLS_PATH=/share/bamtools/bin/ --UTR on --cores 10 --species=Mya_a
NEXT STEP: check files and settings
NEXT STEP: check options
... options check complete.
NEXT STEP: check fasta headers

fasta headers check complete.
NEXT STEP: create SAM header file /mouse/Mya/maker/braker/Mya_a/transcriptome_header.sam.
SAM file /mouse/Mya/maker/braker/Mya_a/transcriptome_header.sam complete.
NEXT STEP: check BAM headers
headers check for BAM file /mouse/Mya/maker/../genome/transcriptome.clam.v.1.11.bam complete.
NEXT STEP: make hints from BAM file /mouse/Mya/maker/../genome/transcriptome.clam.v.1.11.bam
Wait a moment, calculating maximum block size that needs to be allocated... .. done

hints from BAM file /mouse/Mya/maker/../genome/transcriptome.clam.v.1.11.bam added.
NEXT STEP: sort hints
hints sorted.
NEXT STEP: summarize multiple identical hints to one
hints joined.
NEXT STEP: filter introns, find strand and change score to 'mult' entry
strands found and score changed.
hints file complete.
NEXT STEP: execute GeneMark-ET
failed to execute: perl /share/gm_et_linux_64/gmes_petap//gmes_petap.pl --sequence=/mouse/Mya/maker/braker/Mya_a/genome.fa --ET=/mouse/Mya/maker/braker/Mya_a/hintsfile.gff --cores=10 1>/mouse/Mya/maker/braker/Mya_a/GeneMark-ET
.stdout 2>/mouse/Mya/maker/braker/Mya_a/errors/GeneMark-ET.stderr

The stderr and stdout are empty

here is gmes.log

more gmes.log
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 08:52:40 2015] /share/gm_et_linux_64/gmes_petap/probuild --reformat_fasta --uppercase --allow_x --letters_per_line 60 --out data/dna.fna --label _dna --trace info/d
na.trace --in /mouse/Mya/maker/braker/Mya/genome.fa
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 08:52:41 2015] /share/gm_et_linux_64/gmes_petap/reformat_gff.pl --out data/et.gff  --trace info/dna.trace  --in /mouse/Mya/maker/braker/Mya/hintsfile.gff  --quiet
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 08:53:54 2015] /share/gm_et_linux_64/gmes_petap/probuild  --seq data/dna.fna  --allow_x  --stat info/dna.general
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 08:55:11 2015] /share/gm_et_linux_64/gmes_petap/probuild  --seq data/dna.fna  --allow_x  --stat_fasta info/dna.multi_fasta
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 08:55:46 2015] /share/gm_et_linux_64/gmes_petap/probuild  --seq data/dna.fna  --allow_x  --substring_n_distr info/dna.gap_distr
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 08:57:23 2015] /share/gm_et_linux_64/gmes_petap/gc_distr.pl --in data/dna.fna  --out info/dna.gc.csv  --w 1000,8000
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 08:58:38 2015] /share/gm_et_linux_64/gmes_petap/probuild  --seq /mouse/Mya/maker/data/dna.fna  --split dna.fa  --max_contig 5000000 --min_contig 50000 --letters_per
_line 100 --split_at_n 5000 --split_at_x 5000 --allow_x --x_to_n  --trace ../../info/training.trace
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 08:59:05 2015] /share/gm_et_linux_64/gmes_petap/rescale_gff.pl  --in data/et.gff  --trace info/training.trace  --out data/et_training.gff
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 08:59:22 2015] /share/gm_et_linux_64/gmes_petap/probuild --seq data/training.fna --stat info/training.general --allow_x  --GC_PRECISION 0
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 08:59:26 2015] /share/gm_et_linux_64/gmes_petap/parse_by_introns.pl  --section ET_ini  --cfg  /mouse/Mya/maker/run.cfg  --parse_dir /mouse/Mya/maker/run/ET_ini
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 08:59:27 2015] /share/gm_et_linux_64/gmes_petap/make_nt_freq_mat.pl --cfg /mouse/Mya/maker/run.cfg --section donor_GT    --format DONOR
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 08:59:27 2015] error
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 09:06:36 2015] /share/gm_et_linux_64/gmes_petap/probuild --reformat_fasta --uppercase --allow_x --letters_per_line 60 --out data/dna.fna --label _dna --trace info/d
na.trace --in /mouse/Mya/maker/braker/Mya/genome.fa
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 09:06:37 2015] /share/gm_et_linux_64/gmes_petap/reformat_gff.pl --out data/et.gff  --trace info/dna.trace  --in /mouse/Mya/maker/braker/Mya/hintsfile.gff  --quiet
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 09:07:45 2015] /share/gm_et_linux_64/gmes_petap/probuild  --seq data/dna.fna  --allow_x  --stat info/dna.general
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 09:09:06 2015] /share/gm_et_linux_64/gmes_petap/probuild  --seq data/dna.fna  --allow_x  --stat_fasta info/dna.multi_fasta
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 09:09:37 2015] /share/gm_et_linux_64/gmes_petap/probuild  --seq data/dna.fna  --allow_x  --substring_n_distr info/dna.gap_distr
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 09:11:08 2015] /share/gm_et_linux_64/gmes_petap/gc_distr.pl --in data/dna.fna  --out info/dna.gc.csv  --w 1000,8000
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 09:12:23 2015] /share/gm_et_linux_64/gmes_petap/probuild  --seq /mouse/Mya/maker/data/dna.fna  --split dna.fa  --max_contig 5000000 --min_contig 50000 --letters_per
_line 100 --split_at_n 5000 --split_at_x 5000 --allow_x --x_to_n  --trace ../../info/training.trace
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 09:12:51 2015] /share/gm_et_linux_64/gmes_petap/rescale_gff.pl  --in data/et.gff  --trace info/training.trace  --out data/et_training.gff
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 09:13:07 2015] /share/gm_et_linux_64/gmes_petap/probuild --seq data/training.fna --stat info/training.general --allow_x  --GC_PRECISION 0
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 09:13:12 2015] /share/gm_et_linux_64/gmes_petap/parse_by_introns.pl  --section ET_ini  --cfg  /mouse/Mya/maker/run.cfg  --parse_dir /mouse/Mya/maker/run/ET_ini
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 09:13:12 2015] /share/gm_et_linux_64/gmes_petap/make_nt_freq_mat.pl --cfg /mouse/Mya/maker/run.cfg --section donor_GT    --format DONOR
/share/gm_et_linux_64/gmes_petap//gmes_petap.pl : [Mon Oct  5 09:13:12 2015] error

looks like the last command has failed: /share/gm_et_linux_64/gmes_petap/make_nt_freq_mat.pl --cfg /mouse/Mya/maker/run.cfg --section donor_GT --format DONOR

/share/gm_et_linux_64/gmes_petap/make_nt_freq_mat.pl --cfg /mouse/Mya/maker/run.cfg --section donor_GT    --format DONOR
error, no valid sequences were found

from the cfg file

donor_GT:
  auto_order: 1
  format_out: ''
  gc_high: -1
  gc_low: -1
  infile: /mouse/Mya/maker/braker/Mya_a/GeneMark-ET/run/ET_ini/don.seq
  margin: 3
  order: 0
  outfile: GT.mat
  phase: ''
  pseudocounts: 10
  quite: 0
  site_size: 2
  threshold_zero: 2000
  type: GT
  width: 9

Of note, in the cfg file originally it had listed don.seq with no path - and I was getting file not found errors related to that file. I see that don.seq is located elsewhere, so I changed the config file accordingly. However, /mouse/Mya/maker/braker/Mya_a/GeneMark-ET/run/ET_ini/don.seq is empty so the erro rmust be more upstream.

Any help greatly appreciated.

@esrice
Copy link

esrice commented Nov 28, 2015

I'm having this problem as well. Did you ever figure it out?

Thanks!

@peterthorpe5
Copy link

Me too! Any solution? Using Braker 1.8

@peterthorpe5
Copy link

Ok I think I have it ... After trying to run GeneMark the error came back with:
"inappropriate ioctl for device" ... and more
Search for this problem:
(http://stackoverflow.com/questions/1605195/inappropriate-ioctl-for-device)
which identified this being this problem
GLIBCXX_3.4.20
(http://www.unix.com/shell-programming-and-scripting/20812-inappropriate-ioctl-device.html)

How to fix:
(http://askubuntu.com/questions/575505/glibcxx-3-4-20-not-found-how-to-fix-this-error)
sudo apt-get install libstdc++6

This didnt update anything for me as the latest was already intalled.

BUT .....

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade

these fixed the problem.

@cwoehle
Copy link

cwoehle commented May 11, 2016

I had the same problem and found a solution (at least for my case)

My genome dataset consisted of a lot of smaller scaffolds.
On default GeneMark seems to remove contigs shorter than 50kb => Only one contig remained.
I just changed the "min_contig" argument in the "gmes.cfg" file and now it seems to work, although I´m not sure if some
of the other parameters also need some modifications, too.

@philippbayer
Copy link

Hi! I just found this rather randomly when I ran into the above "inappropriate ioctl for device" problem, here's how I fixed that particular problem for me.

I checked braker.log in the output folder which revealed that the last command it tried to run was this one:

augustus-3.2.2/config/../bin/bam2hints --intronsonly --in=/../all_merged.bam --out=/..//braker/pan/bam2hints.temp.gff

Running that command manually revealed that Braker1 was throwing an error message away:

augustus-3.2.2/config/../bin/bam2hints: error while loading shared libraries: libbamtools.so.2.4.0: cannot open shared object file: No such file or directory

So what happened (for me) is that in re-compiling bam2hints I messed up and gave it the wrong path to bamtools' libbamtools.a but for some crazy reason it still compiled. Recompiling bam2hints with the proper LIBS path as described in the bam2hints README then worked, as in:

LIBS = /path/to/bamtools/lib/libbamtools.a -lz

After that the annoying "inappropriate ioctl for device" disappeared.

@cgcjacobs
Copy link

I know that the question is old but for other people with the same problem.
I had a similar problem. At least, Genemark wouldn't run properly. I figured out I had 2 problems.
The first one was simple, I set the path to genemark as:

GENEMARK_PATH=/home/user/software/gm_et_linux_64/gmes_petap/

This causes braker to use /home/user/software/gm_et_linux_64/gmes_petap//gmes_petap.pl as a call to genemark. Which has a forward slash too many. I see you have the same problem.

The second problem I had was with the fasta headers. I had as a header:

Scaffold1 | length4564827

Which gave an error. Genemark removes the second part but something goes wrong when trying to link the two together again.
Using the gedit text editor in Ubuntu you can easily remove everything after >Scaffold1
use search and replace in gedit and replace: |.* using Match as regular expression, leave the replace field empty. When using that file as input braker had no problems and finished successfully.

The search and replace can be adapted when you wish. the " .* " is the wildcard for everything after the " | " sign. So if you change the | sign it will replace everything after and including what you replaced it with.

Hope this will be helpful for someone.

@MatteoSchiavinato
Copy link

MatteoSchiavinato commented Aug 4, 2016

EDIT:
Thank you for all the comments and suggestions, I already fixed 2 problems with your reports!

I have another one.

This is what my BRAKER1 run throws out when crashing:

failed to execute: perl /software/genemark/GeneMark_ES_ET_4.32/gmes_petap.pl --sequence=/../genome.fa --ET=/../hintsfile.gff --cores=20 1>/../GeneMark-ET.stdout 2>/../GeneMark-ET.stderr

If I read the 1> file and the 2> file, the stdout one has a suspicious line:

error on call: /software/genemark/GeneMark_ES_ET_4.32/parse_ET.pl --section ET_C --cfg /../run.cfg --v

And so does the stderr file:

Use of uninitialized value in addition (+) at /software/genemark/GeneMark_ES_ET_4.32/parse_ET.pl line 265.
Use of uninitialized value in addition (+) at /software/genemark/GeneMark_ES_ET_4.32/parse_ET.pl line 266.
Use of uninitialized value in division (/) at /software/genemark/GeneMark_ES_ET_4.32/parse_ET.pl line 266.
Must input more than one data point! at /software/genemark/GeneMark_ES_ET_4.32/parse_ET.pl line 213.
Invalid regression data

Any help is appreciated!

@cgcjacobs the addition of a forward slash shouldn't alter the process, usually they are interpreted as one :)

@sagarutturkar
Copy link

I am stuck at the same step as above:

First, BRAKER generated an error at step:

perl /group/bioinfo/apps/apps/GeneMark-4.32/gmes_petap.pl --sequence=genome.fa \
--ET=hintsfile.gff --cores=20 \
1>GeneMark-ET.stdout 2>GeneMark-ET.stderr

If I rerun this step via Genemark and parameter "min_contig =10000", I get following:

/group/bioinfo/apps/apps/GeneMark-4.32/gmes_petap.pl : [Tue Feb 28 15:51:11 2017] GeneMark-4.32/make_nt_freq_mat.pl --cfg run.cfg --section stop_TAG --format TERM_TAG
gmes_petap.pl : [Tue Feb 28 15:51:11 2017] error

Please post if anyone has determined solution for this.

@Leomajul87
Copy link

Hi, I have exactly the same problem, Did you solve it? I know that my Sequences are not very good but at least GeneMark should work I believe...

@philippbayer
Copy link

Hi @MatteoSchiavinato , @sagarutturkar, and @Leomajul87
I just had the same error in regards to BRAKER and 'Invalid regression data'. In my case filterIntronsFindStrand.stderr in braker's errors folder had many comments about missing contigs. I realised that the bam file I used was slightly incomplete, I made this bam file using samtools merge based on several different libraries. Another weird thing was that GeneMark-ET reported '6 contigs in training' all the time, way too few, no wonder it doesn't have enough data for a regression.

I remade the merged bam file using picard's MergeSamFiles with MERGE_SEQUENCE_DICTIONARIES=true instead of using samtools merge, then BRAKER worked fine. With the new bam file GeneMark-ET also reported 370 and 98,746 contigs in training, much better!

@philippbayer
Copy link

philippbayer commented Jan 5, 2018

I've now also run into the dreaded 'error, no valid sequences were found' error that OP originally had. In my case, I used my own hints.gff file from bam2hints, NOT a bam file. It all boiled down to a wonky gff file, since the gff file BRAKER itself generates from a bam file is differently formatted than the one I gave it.

tl;dr: Either run braker from a bam file and not a gff file, or run one command manually on your input gff file (see last lines), or check the scores column (column 6) in the hints.gff file BRAKER generated to see if it's too low for GeneMark's cutoff

My notes:

GeneMark log:

check before run
create directories
commit input data
data report
commit training data
training data report
prepare initial model
get GC of sequence
GC 37
build initial ET model
error, no valid sequences were found
error on call: ~/gmes_petap/make_nt_freq_mat.pl --cfg <snip>run.cfg --section donor_GT    --format DONOR

Command that caused it:

 perl ~/gm_et_linux_64/gmes_petap/gmes_petap.pl --verbose --sequence=genome.fa --ET=genemark_hintsfile.gff --cores=16 --soft 1000

First observation: In gmes_petap.pl 4.33, the flag is named 'soft_mask' not 'soft', so I changed that. Looking at the code '--soft' seems to be doing nothing, looking at the BRAKER code there's a commented-out line using 'soft_mask' so it seems that a recent GeneMark version changed it to 'soft' and then changed it back to 'soft_mask'. I've emailed the BRAKER maintainers about that.

Then I realised that all files in run/ET_ini were empty, including acc.seq don.seq intron.len parse.et_ini.
This is the command that makes the files in that folder:

~/gmes_petap/parse_by_introns.pl  --section ET_ini  --cfg run.cfg  --parse_dir run/ET_ini

Rerunning it manually prints no error, but rerunning it with the --debug flag enabled prints this at the end:

0  0
GC donors: 0

I'm sure there should be some numbers other than 0 there :)

Before that it printed this:

From 3147846 loaded 0 and ignored dublications 0

which, looking at the code, is printed after loading the input gff3. It looked at 3 million features and added zero lines, so it had nothing to train on but silently ignored that. Now there are hundreds of reasons why the input gff3 file could be malformed, in my specific case this check never triggered:

380             if( $tmp{'type'} ne $label ) {next;}
381 
382             if( $tmp{'score'} ne '.' and  $tmp{'score'} < $score ) { next; }
383 
384             $uniq_id = $id .'_'. $tmp{'left'} .'_'. $tmp{'right'};
385 
386             $tmp{'uniq_id'} = $uniq_id;
387 
388             if( ! exists $check{$uniq_id} )                                                                                                                                                                    
389             {
390                 $check{$uniq_id} = 1;
391                 push @{$ref->{$id}}, { %tmp };
392                 ++$count_in;
393             }
394             else
395             {
396                 ++$count_dub;
397             }

$count_in is the variable that stores the number of loaded lines (0 in my case), $count_dub counts the 'dublications' (also 0), so we never get to that if exists check, so these two checks on line 380 and 382 must always cause 'next' (go to next line).

So I changed that slightly:

380             if( $tmp{'type'} ne $label ) {
381                 print "filtering line due to type label $label\n";                                                                                                                                             
382                 next;}
383 
384             if( $tmp{'score'} ne '.' and  $tmp{'score'} < $score ) {
385                 print "filtering line due to score $score\n";
386                 next; }

and what do you know, it's all this:

filtering line due to score 4

I'm not 100% sure where that 4 is coming from (later edit: I made a mistake in my code, that's just the score cutoff GeneMark uses for filtering of introns not supported by many reads). I then set setting the minimum intron score (et_score) to 3 in the run.cfg. That printed a ton of 'filtering line due to score 2', and again I got a training set of size 0. So I set the et_score to 0. That changed the error to (only visible when run using the --debug flag):

warning, ignored intron without strand information
$VAR1 = {
          'left' => '149122',
          'right' => 202236,
          'uniq_id' => 'dna.fa_1357_149122_202236',
          'strand' => '.',
          'type' => 'INTRON',
          'score' => 0
        };

at which point I realised that all of my introns had no strand, it was all '.' instead of '+' or '-'. Also the 6th column (the score column) was all 0s instead of large-ish numbers (500 etc.). Looking at another BRAKER run started with a bam file I saw that the hints.gff file had strand information there and scores in that column. I also realised that my BRAKER run never ran this command which it ran in my working BRAKER run based on a bam file:

# Fri Dec 15 02:40:21 2017: filter introns, find strand and change score to 'mult' entry
perl BRAKER_v2.0/filterIntronsFindStrand.pl OLD_RUN/genome.fa OLD_RUN/hintsfile.temp.gff --score 1>OLD_RUN/hintsfile.gff 2>OLD_RUN/errors/filterIntronsFindStrand.stderr

Fix

So I ran that command manually on my own gff file, which THEN gave me a good gff file:

perl ../../BRAKER_v2.0/filterIntronsFindStrand.pl genome.fa hintsfile.gff --score > fixed_hintsfile.gff 2> error.gff

That ran for a few seconds, I replaced my hintsfile with that new fixed_hintsfile. I still kept on getting many 'filtering due to low score' messages from introns only supported by one read, but I want to discard those anyway, that's the whole point.

To repeat: manually run BRAKER's filterIntronsFindStrand.pl which adds the strands to your gff file and perhaps adjust the et_score cut-off to 0 if it's also all 4 for you, then it works.

Phew.

I don't know how to fix this error when running BRAKER on a bam file directly, but starting with the et_score cutoff would be a good idea. I used a ton of RNASeq data in this run (all B. oleracea libraries SRA had) so that could cause weird scores, but I still see many large numbers in the score column so I don't get it, but it runs now and I already used too much time on this. Perhaps I could've left it at 4. Later edit: I get it now, my error message is just wonky. I have many lines with a score of 1 (only one read supports the intron?), and my printing prints not the score of that line, but the default et_score cutoff of 4, that's why it's always 4.

P.S.: Perhaps the 4 was chosen by fair dice roll..

Copy link

ghost commented Feb 2, 2018

Besides all the problems that you encounter... please do not run braker.pl with --UTR if you are training from RNA-Seq or protein data, yet. The UTR option only works for AUGUSTUS with pretrained UTR parameters (e.g. if you don't use BRAKER to train any parameters but simply predict genes with existing parameters, which is not what you are doing here, I think).

I have fixed the filterIntronsFindStrand.pl problem, it is now executed by braker.pl for user supplied hints files. I will make a release with the fix today or tomorrow.

The fasta header | issue was fixed a long time ago.

Adjusting the et_score cut-off to 0 might not make much sense. Maybe you get the training to run, then, but results might not be very good. For discussing this, please contact the GeneMark team.

Best,

Katharina (the one who maintains the braker.pl pipeline...)

@sjackman
Copy link

augustus is packaged for Homebrew and Linuxbrew. I also ran into the issue with bam2hints depending on libbamtools.so.2.4.0, which will be resolved once PR https://github.com/Linuxbrew/homebrew-core/pull/6265 is merged.

@rob123king
Copy link

What fixed my problem was just converting the score column to 500 as was all .

@Pinocchiokhaoula
Copy link

Im trying to run Braker but i have always the same problem ,i need help because its a part of my master project and without Braker nothing can be done

Here are my error messages:

Failed to execute: perl /gpfs/scratch/cb/khaoula/Tools/gm_et_linux_64/gmes_petap/gmes_petap.pl --verbose --sequence=/gpfs/scratch/cb/khaoula/Prototheca/braker/Prototheca_Zo/genome.fa --ET=/gpfs/scratch/cb/khaoula/Prototheca/braker/Prototheca_Zo/genemark_hintsfile.gff --cores=1 1>/gpfs/scratch/cb/khaoula/Prototheca/braker/Prototheca_Zo/GeneMark-ET.stdout 2>/gpfs/scratch/cb/khaoula/Prototheca/braker/Prototheca_Zo/errors/GeneMark-ET.stderr

1>/gpfs/scratch/cb/khaoula/Prototheca/braker/Prototheca_Zo/GeneMark-ET.stdout give me the message below

check before run
create directories
commit input data
data report
commit training data
training data report
prepare initial model
get GC of sequence
GC 72
build initial ET model
running step ET_A
running gm.hmm on local system
221 contigs in training
concatenate predictions: /gpfs/scratch/cb/khaoula/Prototheca/braker/Prototheca_Zo/GeneMark-ET/run/ET_A_1
training level ET_A: /gpfs/scratch/cb/khaoula/Prototheca/braker/Prototheca_Zo/GeneMark-ET/run/ET_A_1
error, file not found /gpfs/scratch/cb/khaoula/Tools/gm_et_linux_64/gmes_petap/parse_ET.pl: set.out
error on call: /gpfs/scratch/cb/khaoula/Tools/gm_et_linux_64/gmes_petap/parse_ET.pl --section ET_A --cfg /gpfs/scratch/cb/khaoula/Prototheca/braker/Prototheca_Zo/GeneMark-ET/run.cfg --v


for other ref genome i have another message

1>/gpfs/scratch/cb/khaoula/Prototheca/braker/Prototheca_zopfiii/GeneMark-ET.stdout

check before run
create directories
commit input data
error, output file is empty data/et.gff
error on call: /gpfs/scratch/cb/khaoula/Tools/gm_et_linux_64/gmes_petap/reformat_gff.pl --out data/et.gff --trace info/dna.trace --in /gpfs/scratch/cb/khaoula/Prototheca/braker/Prototheca_zopfiii/hintsfile.gff --quiet

Any help will be appreciated 👍

Thank you in advance

@br302005
Copy link

br302005 commented Mar 4, 2020

@Pinocchiokhaoula I am having the same error with the file not found parse: set.out. Did you ever figure it out? Thanks!

@skagawa2
Copy link

skagawa2 commented Jun 1, 2022

I think I found the issue. The scripts/log_reg_prothints.pl, which converts prothint.gff to prothint_augustus.gff, removes the al_score field in the gff file, which was the other condition to skip over a row. This made it so that the file that was passed to GeneMark-EP via parse_by_introns.pl and eventually to make_nt_freq_mat.pl contained no entries.

The version of this script that is packaged in BRAKER (even the latest commit) is different from the newest version in ProtHint, which says: Edit by Thomas Bruna: keep the al_score information in the output. By replacing /path/to/BRAKER/scripts/log_reg_prothints.pl with the version found in ProtHint (/path/to/ProtHint/dependencies/log_reg_prothints.pl), I got test2 in BRAKER to finally run (the GeneMark-EP test).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment