Skip to content

Instantly share code, notes, and snippets.

@iracooke
Created May 16, 2017 05:22
Show Gist options
  • Save iracooke/91b04d462223ecefc41cd4ca4cfe6898 to your computer and use it in GitHub Desktop.
Save iracooke/91b04d462223ecefc41cd4ca4cfe6898 to your computer and use it in GitHub Desktop.

Running signalp on trinity/transdecoder output

Assuming we have a fasta file of proteins with ids generated from Trinity and Transdecoder called transdecoder.pep. Truncate names as follows.

cat transdecoder.pep | sed -r  's/[^:]*::/>/' > transdecoder_truncated.pep

Note that on a mac you should use -E instead of -r

Then run signalp as normal

signalp -f short transdecoder_truncated.pep > signalp_truncated.out

Finally restore the names

cat signalp_truncated.out | awk '/#/{print $0};match($0,/TRINITY_[0-9A-Z]+_c[0-9]+_g[0-9]+/){ printf("%s::%s\n",substr($0,RSTART,RLENGTH),$0)} ' > signalp.out
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment