Skip to content

Instantly share code, notes, and snippets.

@robsyme
Created November 17, 2009 09:28
Show Gist options
  • Save robsyme/236786 to your computer and use it in GitHub Desktop.
Save robsyme/236786 to your computer and use it in GitHub Desktop.
# We start with something like
# >1_1 [17 - 52] 53 8
# We want to end up with something like
# >Pmm_orf_00001_00001|Pmm_orf_00001_00001 17-52 [Phoma medicaginis var. medicaginis]
# Get rid of the length and coverage info (the '55 8' in the example above)
awk '/^>/ {print $1, $2, $3, $4} /^[^>]/ {print}' OMT5_6frame.fa > nojunk.fasta
# Remove the brackets and underscore
sed -e 's/\[\|\]//g' -e 's/_/ /g' nojunk.fasta > minimal.fasta
# We should no have something like
# >1 1 53 - 8
# Format it nicely with awk
awk 'BEGIN {OFS=""}; /^>/ {printf ">Pmp_orf_%05d_%05d|Pmp_orf_%05d_%05d open reading frame %d-%d [Phoma medicaginis var. pinodella]\n", substr($1,2), $2, substr($1,2), $2, $3, $5} /^[^>]/ {print}' minimal.fasta > M07-4-contigs-6frame.pp.fa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment