Skip to content

Instantly share code, notes, and snippets.

@sp00nman
Last active February 9, 2024 07:03
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save sp00nman/10372555 to your computer and use it in GitHub Desktop.
Save sp00nman/10372555 to your computer and use it in GitHub Desktop.
Calculate transcript length of GTF file
# resource: http://seqanswers.com/forums/showthread.php?t=4914
# Calculate length for each transcript for a GTF file
awk -F"\t" '
$3=="exon"
{
ID=substr($9, length($9)-16, 15);
L[ID]+=$5-$4+1
}
END{
for(i in L)
{print i"\t"L[i]}
}
' gtf-file >output
#gtf-file
#chr1 hg19_ensGene exon 66999066 66999090 0.000000 + . gene_id "ENSG00000118473"; gene_name "SGIP1"; transcript_id "ENST00000237247";
#chr1 hg19_ensGene start_codon 67000042 67000044 0.000000 + . gene_id "ENSG00000118473"; gene_name "SGIP1"; transcript_id "ENST00000237247";
#chr1 hg19_ensGene CDS 67000042 67000051 0.000000 + 0 gene_id "ENSG00000118473"; gene_name "SGIP1"; transcript_id "ENST00000237247";
#output
#ENST00000397500 2194
#ENST00000344941 2923
#ENST00000397501 4715
@moa4020
Copy link

moa4020 commented Feb 9, 2024

Modification suggested:

awk -F"\t" '
$3=="Exon" {
match($9, /ENSMUST[0-9]+.[0-9]+/);
if (RSTART) {
ID = substr($9, RSTART, RLENGTH);
L[ID] += $5 - $4 + 1;
}
}
END {
for(i in L) {
print i"\t"L[i]
}
}
' gtf-file > output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment