suryasaha/Post OrthoMCL processing

## Post OrthoMCL processing
surya@hlb orthomcl_plasmids (master *%)$ cat proteomes/*.faa > allproteins.faa
surya@hlb orthomcl_plasmids (master *%)$ cat proteomes/*.faa | grep '^>' > names.txt
surya@hlb orthomcl_plasmids (master *%)$ wc -l names.txt
15899 names.txt

surya@hlb orthomcl_plasmids (master *%)$ ./orthomcl.ValidateGroups.pl -s groups.txt.eval-5percent50 -c groups.txt.eval-5percent60 -r groups.txt.eval-5percent30 -m map.txt -n names.txt
4169 records read from groups.txt.eval-5percent50 ..
4159 records read from groups.txt.eval-5percent60 ..
4172 records read from groups.txt.eval-5percent30 ..
Runtime details
System time for process: 0
User time for process: 19.48
surya@hlb orthomcl_plasmids (master *%)$ wc -l Validated_clusters_*.txt
     4 Validated_clusters_40.txt
    28 Validated_clusters_60.txt
     6 Validated_clusters_80.txt
  4060 Validated_clusters_common.txt


surya@hlb post_orthomcl_plasmids (master *%)$ ln -s ../orthomcl_plasmids/allproteins.faa
surya@hlb post_orthomcl_plasmids (master *%)$ ln -s ../orthomcl_plasmids/map.txt
surya@hlb post_orthomcl_plasmids (master *%)$ time ./orthomcl.Proteome2sets.pl -s ../orthomcl_plasmids/Validated_clusters_common.txt -c ../orthomcl_plasmids/groups.txt.eval-5percent60 -r ../orthomcl_plasmids/groups.txt.eval-5percent30 -f allproteins.faa -m map_renamed.txt
Subroutine main::getcwd redefined at /usr/share/perl5/Exporter.pm line 67.
 at ./orthomcl.Proteome2sets.pl line 11.
.............
Use of uninitialized value $\ in regexp compilation at ./orthomcl.Proteome2sets.pl line 285.
Use of uninitialized value $\ in regexp compilation at ./orthomcl.Proteome2sets.pl line 285.
real    3m36.928s
user    3m36.235s
surya@hlb post_orthomcl_plasmids (master *%)$ time ./orthomcl.Proteome2sets.pl -s ../orthomcl_plasmids/Validated_clusters_common.txt -c ../orthomcl_plasmids/groups.txt.eval-5percent60 -r ../orthomcl_plasmids/groups.txt.eval-5percent30 -f allproteins.faa -m map_renamed.txt -d Proteome2sets.out.xls 2>err


surya@hlb post_orthomcl_plasmids (master *%)$ grep -c '^>' labelled.allproteins.faa
15899
surya@hlb post_orthomcl_plasmids (master *%)$ grep '^>' labelled.allproteins.faa| awk -F"|" '{print $NF}'| sort| uniq -c
   5741  core
    447  core_paralogous
   3191  lineage_specific
    699  lineage_specific_paralogous
   5250  shared
    571  shared_paralogous

surya@hlb post_orthomcl_plasmids (master *%)$ ~/tools/ncbi-blast-2.2.28+/bin/makeblastdb -in labelled.allproteins.faa -input_type fasta -dbtype prot -out labelled.allproteins
Building a new DB, current time: 11/30/2014 20:32:33
New DB name:   labelled.allproteins
New DB title:  labelled.allproteins.faa
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 15899 sequences in 1.23835 seconds.

surya@hlb post_orthomcl_plasmids$ ls shared_class/| sed 's,\.faa,,' > shared_clusters.names
surya@hlb post_orthomcl_plasmids$ ls core_class/| sed 's,\.faa,,' > core_clusters.names
	surya@hlb orthomcl_plasmids (master %)$ cat proteomes/.faa > allproteins.faa
	surya@hlb orthomcl_plasmids (master %)$ cat proteomes/.faa \| grep '^>' > names.txt
	surya@hlb orthomcl_plasmids (master *%)$ wc -l names.txt
	15899 names.txt

	surya@hlb orthomcl_plasmids (master *%)$ ./orthomcl.ValidateGroups.pl -s groups.txt.eval-5percent50 -c groups.txt.eval-5percent60 -r groups.txt.eval-5percent30 -m map.txt -n names.txt
	4169 records read from groups.txt.eval-5percent50 ..
	4159 records read from groups.txt.eval-5percent60 ..
	4172 records read from groups.txt.eval-5percent30 ..
	Runtime details
	System time for process: 0
	User time for process: 19.48
	surya@hlb orthomcl_plasmids (master %)$ wc -l Validated_clusters_.txt
	4 Validated_clusters_40.txt
	28 Validated_clusters_60.txt
	6 Validated_clusters_80.txt
	4060 Validated_clusters_common.txt


	surya@hlb post_orthomcl_plasmids (master *%)$ ln -s ../orthomcl_plasmids/allproteins.faa
	surya@hlb post_orthomcl_plasmids (master *%)$ ln -s ../orthomcl_plasmids/map.txt
	surya@hlb post_orthomcl_plasmids (master *%)$ time ./orthomcl.Proteome2sets.pl -s ../orthomcl_plasmids/Validated_clusters_common.txt -c ../orthomcl_plasmids/groups.txt.eval-5percent60 -r ../orthomcl_plasmids/groups.txt.eval-5percent30 -f allproteins.faa -m map_renamed.txt
	Subroutine main::getcwd redefined at /usr/share/perl5/Exporter.pm line 67.
	at ./orthomcl.Proteome2sets.pl line 11.
	.............
	Use of uninitialized value $\ in regexp compilation at ./orthomcl.Proteome2sets.pl line 285.
	Use of uninitialized value $\ in regexp compilation at ./orthomcl.Proteome2sets.pl line 285.
	real 3m36.928s
	user 3m36.235s
	surya@hlb post_orthomcl_plasmids (master *%)$ time ./orthomcl.Proteome2sets.pl -s ../orthomcl_plasmids/Validated_clusters_common.txt -c ../orthomcl_plasmids/groups.txt.eval-5percent60 -r ../orthomcl_plasmids/groups.txt.eval-5percent30 -f allproteins.faa -m map_renamed.txt -d Proteome2sets.out.xls 2>err


	surya@hlb post_orthomcl_plasmids (master *%)$ grep -c '^>' labelled.allproteins.faa
	15899
	surya@hlb post_orthomcl_plasmids (master *%)$ grep '^>' labelled.allproteins.faa\| awk -F"\|" '{print $NF}'\| sort\| uniq -c
	5741 core
	447 core_paralogous
	3191 lineage_specific
	699 lineage_specific_paralogous
	5250 shared
	571 shared_paralogous

	surya@hlb post_orthomcl_plasmids (master *%)$ ~/tools/ncbi-blast-2.2.28+/bin/makeblastdb -in labelled.allproteins.faa -input_type fasta -dbtype prot -out labelled.allproteins
	Building a new DB, current time: 11/30/2014 20:32:33
	New DB name: labelled.allproteins
	New DB title: labelled.allproteins.faa
	Sequence type: Protein
	Keep Linkouts: T
	Keep MBits: T
	Maximum file size: 1000000000B
	Adding sequences from FASTA; added 15899 sequences in 1.23835 seconds.

	surya@hlb post_orthomcl_plasmids$ ls shared_class/\| sed 's,\.faa,,' > shared_clusters.names
	surya@hlb post_orthomcl_plasmids$ ls core_class/\| sed 's,\.faa,,' > core_clusters.names