Skip to content

Instantly share code, notes, and snippets.

@rknx
Created January 4, 2022 20:53
Show Gist options
  • Save rknx/2fb51d9e4f81fca33ebd96ed352a7067 to your computer and use it in GitHub Desktop.
Save rknx/2fb51d9e4f81fca33ebd96ed352a7067 to your computer and use it in GitHub Desktop.
Convert GFF3 file to GTF2.5
#!/bin/bash
########## Anuj Sharma ##########
########## rknx@outlook.com ##########
########## github/rknx ##########
########## 2022/01/04 ##########
[[ -z "$1" ]] && echo "Usage: gff2gtf.sh in.gff > out.gtf" >&2 && exit
[[ ! -s "$1" ]] && echo "Provide valid input file" >&2 && exit
# Remove everything (the sequences) starting with ##FASTA
# Remove contig names
# Simplify second column
# Change ID to gene_id, tRNA to transcript, and format 9th column
# only keep valid entries for 3rd column
sed -n '/##FASTA/q;p' $1 | \
grep -v "^##" | \
awk -vFS="\t" -vOFS="\t" '{split($2, a, ":"); $2=a[1]; print $0}' | \
sed 's/ID=/gene_id=/g; s/=/ "/g; s/;/"; /g; s/$/"/g; s/tRNA/transcript/g' | \
grep -E "gene|transcript|exon|CDS|UTR|start_codon|stop_codon|Selenocysteine"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment