Skip to content

Instantly share code, notes, and snippets.

@suryasaha
Created July 24, 2015 14:03
Show Gist options
  • Save suryasaha/949741ec6b541bf6f1ce to your computer and use it in GitHub Desktop.
Save suryasaha/949741ec6b541bf6f1ce to your computer and use it in GitHub Desktop.
surya@hlb orthomcl_plasmids (master *%)$ cat proteomes/*.faa > allproteins.faa
surya@hlb orthomcl_plasmids (master *%)$ cat proteomes/*.faa | grep '^>' > names.txt
surya@hlb orthomcl_plasmids (master *%)$ wc -l names.txt
15899 names.txt
surya@hlb orthomcl_plasmids (master *%)$ ./orthomcl.ValidateGroups.pl -s groups.txt.eval-5percent50 -c groups.txt.eval-5percent60 -r groups.txt.eval-5percent30 -m map.txt -n names.txt
4169 records read from groups.txt.eval-5percent50 ..
4159 records read from groups.txt.eval-5percent60 ..
4172 records read from groups.txt.eval-5percent30 ..
Runtime details
System time for process: 0
User time for process: 19.48
surya@hlb orthomcl_plasmids (master *%)$ wc -l Validated_clusters_*.txt
4 Validated_clusters_40.txt
28 Validated_clusters_60.txt
6 Validated_clusters_80.txt
4060 Validated_clusters_common.txt
surya@hlb post_orthomcl_plasmids (master *%)$ ln -s ../orthomcl_plasmids/allproteins.faa
surya@hlb post_orthomcl_plasmids (master *%)$ ln -s ../orthomcl_plasmids/map.txt
surya@hlb post_orthomcl_plasmids (master *%)$ time ./orthomcl.Proteome2sets.pl -s ../orthomcl_plasmids/Validated_clusters_common.txt -c ../orthomcl_plasmids/groups.txt.eval-5percent60 -r ../orthomcl_plasmids/groups.txt.eval-5percent30 -f allproteins.faa -m map_renamed.txt
Subroutine main::getcwd redefined at /usr/share/perl5/Exporter.pm line 67.
at ./orthomcl.Proteome2sets.pl line 11.
.............
Use of uninitialized value $\ in regexp compilation at ./orthomcl.Proteome2sets.pl line 285.
Use of uninitialized value $\ in regexp compilation at ./orthomcl.Proteome2sets.pl line 285.
real 3m36.928s
user 3m36.235s
surya@hlb post_orthomcl_plasmids (master *%)$ time ./orthomcl.Proteome2sets.pl -s ../orthomcl_plasmids/Validated_clusters_common.txt -c ../orthomcl_plasmids/groups.txt.eval-5percent60 -r ../orthomcl_plasmids/groups.txt.eval-5percent30 -f allproteins.faa -m map_renamed.txt -d Proteome2sets.out.xls 2>err
surya@hlb post_orthomcl_plasmids (master *%)$ grep -c '^>' labelled.allproteins.faa
15899
surya@hlb post_orthomcl_plasmids (master *%)$ grep '^>' labelled.allproteins.faa| awk -F"|" '{print $NF}'| sort| uniq -c
5741 core
447 core_paralogous
3191 lineage_specific
699 lineage_specific_paralogous
5250 shared
571 shared_paralogous
surya@hlb post_orthomcl_plasmids (master *%)$ ~/tools/ncbi-blast-2.2.28+/bin/makeblastdb -in labelled.allproteins.faa -input_type fasta -dbtype prot -out labelled.allproteins
Building a new DB, current time: 11/30/2014 20:32:33
New DB name: labelled.allproteins
New DB title: labelled.allproteins.faa
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 15899 sequences in 1.23835 seconds.
surya@hlb post_orthomcl_plasmids$ ls shared_class/| sed 's,\.faa,,' > shared_clusters.names
surya@hlb post_orthomcl_plasmids$ ls core_class/| sed 's,\.faa,,' > core_clusters.names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment