Skip to content

Instantly share code, notes, and snippets.

@robsyme
Created November 18, 2009 04:37
Show Gist options
  • Save robsyme/237569 to your computer and use it in GitHub Desktop.
Save robsyme/237569 to your computer and use it in GitHub Desktop.
Genemark produces genes with names such as "10000_g". This little awk script pads the id with zeroes and gives a sensible name. Run awk -f scriptname filename
BEGIN {FS="_"}
/^>/ {printf ">PmpT_%05d.1|PmpT_%05d hypothetical protein [Phoma medicaginis var. medicaginis]\n", substr($1,2), substr($1,2)}
/^[^>]/ {print}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment