Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@jrhawley
Created June 19, 2020 16:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jrhawley/0785a773e287e762c50ab92fcabfb492 to your computer and use it in GitHub Desktop.
Save jrhawley/0785a773e287e762c50ab92fcabfb492 to your computer and use it in GitHub Desktop.
Awk scripts for filtering gene and transcript annotations from GTF files
BEGIN{
# column separated by tabs or `; `
FS="(\\t|; )";
# write output separated by tabs
OFS="\t";
}
{
# only select genes
if (NR > 5 && $3 == "gene") {
# clean gene_id
gsub(/gene_id "/, "", $9);
gsub(/"/, "", $9);
# clean gene_name
gsub(/gene_name "/, "", $11);
gsub(/"/, "", $11);
# print chr, start, end, strand, gene_id, gene_name
print $1, $4, $5, $7, $9, $11
}
}
BEGIN{
# column separated by tabs or `; `
FS="(\\t|; )";
# write output separated by tabs
OFS="\t";
}
{
# only select transcripts
if (NR > 5 && $3 == "transcript") {
# clean the gene_id
gsub(/gene_id "/, "", $9);
gsub(/"/, "", $9);
# clean the transcript_id
gsub(/transcript_id "/, "", $10);
gsub(/"/, "", $10);
# clean the gene_name
gsub(/gene_name "/, "", $12);
gsub(/"/, "", $12);
# clean the transcript_name
gsub(/transcript_name "/, "", $14);
gsub(/"/, "", $14);
# print chr, start, end, strand, gene_id, gene_name, transcript_id, transcript_name
print $1, $4, $5, $7, $9, $12, $10, $14
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment