Skip to content

Instantly share code, notes, and snippets.

@ag1805x
Last active July 13, 2020 07:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ag1805x/b16247009353fa80873134c4e3d9a19d to your computer and use it in GitHub Desktop.
Save ag1805x/b16247009353fa80873134c4e3d9a19d to your computer and use it in GitHub Desktop.
One liners to explore GTF file
# Extract list of unique Gene IDs in GTF file
awk '{if($3 == "gene") print $0}' ../Homo_sapiens.GRCh38.84.gtf | cut -f9 | cut -d ';' -f1 | cut -d ' ' -f2 | sort | uniq | wc -l
# Extract gene list of particular biotype
grep 'gene_biotype "protein_coding"' ../Homo_sapiens.GRCh38.84.gtf | awk '{if($3 == "gene") print $0}' | cut -f9 | cut -d ';' -f1 | cut -d ' ' -f2 | sort | uniq | wc -l
# Create subset of Ensembl GTF file based on gene biotype
grep -E '#|gene_biotype "sense_overlapping"' ../Homo_sapiens.GRCh38.84.gtf > Homo_sapiens.GRCh38.84.sense_overlapping.gtf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment