Skip to content

Instantly share code, notes, and snippets.

View gpertea's full-sized avatar

Geo Pertea gpertea

View GitHub Profile
@gpertea
gpertea / mstrg_prep.py
Last active December 9, 2023 06:13
appending ref_gene_id (or gene_name) info to MSTRG gene_ids in stringtie --merge output
#!/bin/env python3
#Usage: mstrg_prep.py merged.gtf > merged_prep.gtf
import re, fileinput
g={} #gene_id => {ref_gene_ids}
prep=[] #array of [line, mstrg_id]
for line in fileinput.input():
line=line.rstrip()
t=line.split('\t')
if len(t)<9:
print(line)
@gpertea
gpertea / mstrg_prep.pl
Last active December 27, 2022 12:46
post-processing of StringTie merge output to append ref_gene_id info to the MSTRG gene_id
#!/bin/env perl
#Usage: mstrg_prep.pl merged.gtf > merged_prep.gtf
use strict;
my %g; # gene_id => \%ref_gene_ids (or gene_names)
my @prep; # array of [line, original_id]
while (<>) {
s/ +$//;
my @t=split(/\t/);
unless (@t>8) { print $_; next }
my ($gid)=($t[8]=~m/gene_id "(MSTRG\.\d+)"/);
@gpertea
gpertea / str_split_pattern.cpp
Created February 1, 2018 16:09
fast in-place parse of a list of comma-delimited int values in a string SAM tag
char* str=brec->tag_str("ZD"); //let's say the tag is "ZD"
GVec<int> vals;
char* p=str; //slice start
for (int i=0;;++i) {
char ch=str[i];
if (ch==',') {
str[i]=0;
int v=atoi(p); //check for int parsing errors?
vals.Add(v);
p=str+i+1;