Skip to content

Instantly share code, notes, and snippets.

@gpertea
Last active December 27, 2022 12:46
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save gpertea/b83f1b32435e166afa92a2d388527f4b to your computer and use it in GitHub Desktop.
Save gpertea/b83f1b32435e166afa92a2d388527f4b to your computer and use it in GitHub Desktop.
post-processing of StringTie merge output to append ref_gene_id info to the MSTRG gene_id
#!/bin/env perl
#Usage: mstrg_prep.pl merged.gtf > merged_prep.gtf
use strict;
my %g; # gene_id => \%ref_gene_ids (or gene_names)
my @prep; # array of [line, original_id]
while (<>) {
s/ +$//;
my @t=split(/\t/);
unless (@t>8) { print $_; next }
my ($gid)=($t[8]=~m/gene_id "(MSTRG\.\d+)"/);
if ($gid) {
push(@prep, [$_, $gid]);
my ($rn)=($t[8]=~m/ref_gene_id "([^"]+)/);
#or for gene_name:
#my ($rn)=($t[8]=~m/gene_name "([^"]+)/);
if ($rn) {
my $h=$g{$gid};
if ($h) { $h->{$rn}=1 }
else { $g{$gid}= { $rn=>1 } }
}
}
else { print $_ }
}
my ($prevgid, $gadd);
foreach my $d (@prep) {
my ($line, $gid)=@$d;
if ($prevgid ne $gid) {
$prevgid=$gid;
$gadd='';
if (my $gd=$g{$gid}) {
$gadd='|'.join('|', (sort(keys(%$gd))))
}
}
$line=~s/gene_id "MSTRG\.\d+/gene_id "$gid$gadd/ if $gadd;
print $line;
}
@drdna
Copy link

drdna commented Jul 30, 2021

I might be missing something but won't the last if statement always fail because the $gd hash reference hasn't been created yet?

@gpertea
Copy link
Author

gpertea commented Aug 2, 2021

You mean the one at line 30? if (my $gd=$g{$gid})? (even though that's not the last if statement ;)). If so, no. In that test expression the reference is being created, assigned to $g{$gid} and then it is actually tested.

@drdna
Copy link

drdna commented Aug 2, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment