Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Calculate transcript length of GTF file
# resource: http://seqanswers.com/forums/showthread.php?t=4914
# Calculate length for each transcript for a GTF file
awk -F"\t" '
$3=="exon"
{
ID=substr($9, length($9)-16, 15);
L[ID]+=$5-$4+1
}
END{
for(i in L)
{print i"\t"L[i]}
}
' gtf-file >output
#gtf-file
#chr1 hg19_ensGene exon 66999066 66999090 0.000000 + . gene_id "ENSG00000118473"; gene_name "SGIP1"; transcript_id "ENST00000237247";
#chr1 hg19_ensGene start_codon 67000042 67000044 0.000000 + . gene_id "ENSG00000118473"; gene_name "SGIP1"; transcript_id "ENST00000237247";
#chr1 hg19_ensGene CDS 67000042 67000051 0.000000 + 0 gene_id "ENSG00000118473"; gene_name "SGIP1"; transcript_id "ENST00000237247";
#output
#ENST00000397500 2194
#ENST00000344941 2923
#ENST00000397501 4715
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment