Skip to content

Instantly share code, notes, and snippets.

@DaisukeMiyamoto
Last active June 30, 2021 09:23
Show Gist options
  • Save DaisukeMiyamoto/38b372b65a45d217ae580164fe6cde21 to your computer and use it in GitHub Desktop.
Save DaisukeMiyamoto/38b372b65a45d217ae580164fe6cde21 to your computer and use it in GitHub Desktop.

testing VirtualFlow tutorial on AWS ParallelCluster

https://docs.virtual-flow.org/tutorials/-LdE94b2AVfBFT72zK-v/

set up AWS ParallelCluster

  • example config file for AWS ParallelCluster
[aws]
aws_region_name = us-east-1

[global]
cluster_template = default
update_check = true
sanity_check = true

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

[cluster default]
key_name = <KEY_NAME>
base_os = centos7
scheduler = slurm
master_instance_type = c5.xlarge
compute_instance_type = c5.2xlarge
maintain_initial_size = true
vpc_settings = default
master_root_volume_size = 1000
dcv_settings = dcv1
max_queue_size = 100

tags = {"Project": "ParallelCluster-virtualflow"}

[vpc default]
vpc_id = <VPC_ID>
master_subnet_id = <MASTER_SUBNET_ID>
compute_subnet_id = <COMPUTE_SUBNET_ID>
use_public_ips = false

[dcv dcv1]
enable = master

Tutorial1:

with SGE

not work

with Slurm

prepare on AWS ParallelCluster

  • edit partition in templates/all.ctrl
partition=compute
  • edit slurm settings in sudo emacs /opt/slurm/etc/slurm.conf

Value of RealMemory could be found by /opt/slurm/sbin/slurmd -C in a compute node.

you could also check this issue: aws/aws-parallelcluster#1517

NodeName=DEFAULT RealMemory=14938
include slurm_parallelcluster_nodes.conf
PartitionName=compute Nodes=ALL Default=YES MaxTime=INFINITE State=UP
  • after the setting, restart slurmctld
sudo service slurmctld restart

install VFTools

https://docs.virtual-flow.org/documentation/-LdE8RH9UN4HKpckqkX3/vftools/installation-1

  • install OpenBabel
sudo yum install openbable
  • install VFTools
wget https://github.com/VirtualFlow/VFTools/archive/master.tar.gz
tar -xvzf master.tar.gz
mv VFTools-master VFTools
  • set PATH
export PATH="<parent folder>/VFTools/bin:$PATH"

Tutorial1 important points

results

submit job

$ ./vf_start_jobline.sh 1 12 templates/template1.slurm.sh submit 1                                           


        ::  ::  ::  ::::. :::::: ::  ::  .::::.  ::      :::::  ::    .::::. ::      ::
        ::  ::  ::  :: ::   ::   ::  ::  ::  ::  ::      ::     ::    ::  :: ::  ::  ::
         ::::   ::  :::.    ::   ::  ::  ::::::  ::      :::::  ::    ::  ::  ::::::::
          ::    ::  :: ::   ::    ::::   ::  ::  ::::    ::     ::::: '::::'   ::  ::



Syncing the jobfile of jobline 1 with the controlfile file ../../workflow/control/all.ctrl.
Syncing the jobfile of jobline 2 with the controlfile file ../../workflow/control/all.ctrl.
Syncing the jobfile of jobline 3 with the controlfile file ../../workflow/control/all.ctrl.
Syncing the jobfile of jobline 4 with the controlfile file ../../workflow/control/all.ctrl.
Syncing the jobfile of jobline 5 with the controlfile file ../../workflow/control/all.ctrl.
Syncing the jobfile of jobline 6 with the controlfile file ../../workflow/control/all.ctrl.
Syncing the jobfile of jobline 7 with the controlfile file ../../workflow/control/all.ctrl.
Syncing the jobfile of jobline 8 with the controlfile file ../../workflow/control/all.ctrl.
Syncing the jobfile of jobline 9 with the controlfile file ../../workflow/control/all.ctrl.
Syncing the jobfile of jobline 10 with the controlfile file ../../workflow/control/all.ctrl.
Syncing the jobfile of jobline 11 with the controlfile file ../../workflow/control/all.ctrl.
Syncing the jobfile of jobline 12 with the controlfile file ../../workflow/control/all.ctrl.

Submitted batch job 4
The job for jobline 1 has been submitted at Sun Jul 12 06:41:50 UTC 2020.

Submitted batch job 5
The job for jobline 2 has been submitted at Sun Jul 12 06:41:51 UTC 2020.

Submitted batch job 6
The job for jobline 3 has been submitted at Sun Jul 12 06:41:52 UTC 2020.

Submitted batch job 7
The job for jobline 4 has been submitted at Sun Jul 12 06:41:53 UTC 2020.

Submitted batch job 8
The job for jobline 5 has been submitted at Sun Jul 12 06:41:54 UTC 2020.

Submitted batch job 9
The job for jobline 6 has been submitted at Sun Jul 12 06:41:55 UTC 2020.

Submitted batch job 10
The job for jobline 7 has been submitted at Sun Jul 12 06:41:56 UTC 2020.

Submitted batch job 11
The job for jobline 8 has been submitted at Sun Jul 12 06:41:57 UTC 2020.

Submitted batch job 12
The job for jobline 9 has been submitted at Sun Jul 12 06:41:58 UTC 2020.

Submitted batch job 13
The job for jobline 10 has been submitted at Sun Jul 12 06:41:59 UTC 2020.

Submitted batch job 14
The job for jobline 11 has been submitted at Sun Jul 12 06:42:00 UTC 2020.

Submitted batch job 15
The job for jobline 12 has been submitted at Sun Jul 12 06:42:01 UTC 2020.

squeue

$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 4   compute    t-1.1   centos  R       5:37      1 ip-10-0-19-5
                 5   compute    t-2.1   centos  R       5:34      1 ip-10-0-19-5
                 6   compute    t-3.1   centos  R       5:34      1 ip-10-0-19-5
                 7   compute    t-4.1   centos  R       5:34      1 ip-10-0-19-5
                 8   compute    t-5.1   centos  R       5:33      1 ip-10-0-19-5
                 9   compute    t-6.1   centos  R       5:31      1 ip-10-0-19-5
                10   compute    t-7.1   centos  R       5:31      1 ip-10-0-19-5
                11   compute    t-8.1   centos  R       5:28      1 ip-10-0-19-5
                12   compute    t-9.1   centos  R       1:30      1 ip-10-0-22-46
                13   compute   t-10.1   centos  R       1:30      1 ip-10-0-22-46

monitoring jobs

$ ./vf_report.sh -c workflow


        ::  ::  ::  ::::. :::::: ::  ::  .::::.  ::      :::::  ::    .::::. ::      ::
        ::  ::  ::  :: ::   ::   ::  ::  ::  ::  ::      ::     ::    ::  :: ::  ::  ::
         ::::   ::  :::.    ::   ::  ::  ::::::  ::      :::::  ::    ::  ::  ::::::::
          ::    ::  :: ::   ::    ::::   ::  ::  ::::    ::     ::::: '::::'   ::  ::



                                  Sun Jul 12 06:48:26 UTC 2020                                       


                                         Workflow Status                                        
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::


                                             Joblines    
................................................................................................

 Number of jobfiles in the workflow/jobfiles/main folder: 12
 Number of joblines in the batch system: 10
 Number of joblines in the batch system currently running: 10
  * Number of joblines in queue "compute" currently running: 10
 Number of joblines in the batch system currently not running: 0
  * Number of joblines in queue "compute" currently not running: 0
 Number of cores/slots currently used by the workflow: 10


                                            Collections    
................................................................................................

 Total number of ligand collections: 68
 Number of ligand collections completed: 6
 Number of ligand collections in state "processing": 10
 Number of ligand collections not yet started: 52                                   


                                 Ligands (in completed collections)   
................................................................................................

 Total number of ligands: 1123                                                     
 Number of ligands started: 8                                                     
 Number of ligands successfully completed: 8                                                
 Number of ligands failed: 0                                                


                                Dockings (in completed collections)   
................................................................................................

 Docking runs per ligand: 2
 Number of dockings started: 16                                                     
 Number of dockings successfully completed: 16                                                
 Number of dockings failed: 0                                                

$ ./vf_report.sh -c vs -d qvina02_rigid_receptor1 -n 10


        ::  ::  ::  ::::. :::::: ::  ::  .::::.  ::      :::::  ::    .::::. ::      ::
        ::  ::  ::  :: ::   ::   ::  ::  ::  ::  ::      ::     ::    ::  :: ::  ::  ::
         ::::   ::  :::.    ::   ::  ::  ::::::  ::      :::::  ::    ::  ::  ::::::::
          ::    ::  :: ::   ::    ::::   ::  ::  ::::    ::     ::::: '::::'   ::  ::



                                  Sun Jul 12 06:49:09 UTC 2020                                       


                              Preliminary Virtual Screening Results                             
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::


                                  Binding affinity - statistics    
................................................................................................

 Number of ligands screened with binding affinity between     0  and   inf kcal/mole: 0
 Number of ligands screened with binding affinity between  -0.1  and  -5.0 kcal/mole: 1
 Number of ligands screened with binding affinity between  -5.0  and  -5.5 kcal/mole: 5
 Number of ligands screened with binding affinity between  -5.5  and  -6.0 kcal/mole: 7
 Number of ligands screened with binding affinity between  -6.0  and  -6.5 kcal/mole: 4
 Number of ligands screened with binding affinity between  -6.5  and  -7.0 kcal/mole: 0
 Number of ligands screened with binding affinity between  -7.0  and  -7.5 kcal/mole: 1
 Number of ligands screened with binding affinity between  -7.5  and  -8.0 kcal/mole: 0
 Number of ligands screened with binding affinity between  -8.0  and  -8.5 kcal/mole: 0
 Number of ligands screened with binding affinity between  -8.5  and  -9.0 kcal/mole: 0
 Number of ligands screened with binding affinity between  -9.0  and  -9.5 kcal/mole: 0
 Number of ligands screened with binding affinity between  -9.5  and -10.0 kcal/mole: 0
 Number of ligands screened with binding affinity between -10.0  and -10.5 kcal/mole: 0
 Number of ligands screened with binding affinity between -10.5  and -11.0 kcal/mole: 0
 Number of ligands screened with binding affinity between -11.0  and -11.5 kcal/mole: 0
 Number of ligands screened with binding affinity between -11.5  and -12.0 kcal/mole: 0
 Number of ligands screened with binding affinity between -12.0  and -12.5 kcal/mole: 0
 Number of ligands screened with binding affinity between -12.5  and -13.0 kcal/mole: 0
 Number of ligands screened with binding affinity between -13.0  and -13.5 kcal/mole: 0
 Number of ligands screened with binding affinity between -13.5  and -14.0 kcal/mole: 0
 Number of ligands screened with binding affinity between -14.0  and -14.5 kcal/mole: 0
 Number of ligands screened with binding affinity between -14.5  and -15.0 kcal/mole: 0
 Number of ligands screened with binding affinity between -15.0  and -20.0 kcal/mole: 0
 Number of ligands screened with binding affinity between -20.0  and  -inf kcal/mole: 0


                          Binding affinity - highest scoring compounds    
................................................................................................

       Rank  Ligand                Collection    Highest-Score

       1     PV-001938623963_2_T1  GACBEG_00000  -7.0
       2     Z3013447159_1_T1      GACCAD_00000  -6.3
       3     PV-001873781580_1     HACBAE_00000  -6.3
       4     PV-001873781580_2     HACBAE_00000  -6.0
       5     PV-001847295098_1     HACBFF_00000  -6.0
       6     PV-001938623963_1_T1  GACBEG_00000  -5.8
       7     PV-001873781822_1     HACBAE_00000  -5.7
       8     Z2801168368_1_T1      GACACC_00000  -5.6
       9     Z2092504580_1_T1      GAFFCG_00000  -5.6
       10    Z2092508107_1_T1      GAFFCG_00000  -5.6

generate docking pose

$ vfvs_pp_prepare_dockingposes.sh ../../../output-files/complete/qvina02_rigid_receptor1/results/ meta_tranch compounds dockingsposes overwrite


*********************************************************************
                  Extracting the winning structrures                 
*********************************************************************

 * The output folder dockingsposes does already exist. Removing...
 * The file compounds.energies does already exist. Deleting...

 *** Preparing structure Z2624037004_3 ***
GACEBG/00000.tar.gz
00000/Z2624037004_3_replica-1.pdbqt
9 molecules converted
9 files output. The first is Z2624037004_3.rank-1.pdb
9 molecules converted
9 files output. The first is Z2624037004_3.rank-1.sdf
9 molecules converted

 *** Preparing structure Z2624037004_4 ***
GACEBG/00000.tar.gz
00000/Z2624037004_4_replica-1.pdbqt
5 molecules converted
5 files output. The first is Z2624037004_4.rank-1.pdb
5 molecules converted
5 files output. The first is Z2624037004_4.rank-1.sdf
5 molecules converted

 *** Preparing structure Z2087260951_4 ***
GACEBG/00000.tar.gz
00000/Z2087260951_4_replica-1.pdbqt
9 molecules converted
9 files output. The first is Z2087260951_4.rank-1.pdb
9 molecules converted
9 files output. The first is Z2087260951_4.rank-1.sdf
9 molecules converted

 *** Preparing structure Z2087256678_1 ***
GACEBG/00000.tar.gz
00000/Z2087256678_1_replica-1.pdbqt
9 molecules converted
9 files output. The first is Z2087256678_1.rank-1.pdb
9 molecules converted
9 files output. The first is Z2087256678_1.rank-1.sdf
9 molecules converted

 *** Preparing structure Z2087260951_2 ***
GACEBG/00000.tar.gz
00000/Z2087260951_2_replica-1.pdbqt
9 molecules converted
9 files output. The first is Z2087260951_2.rank-1.pdb
9 molecules converted
9 files output. The first is Z2087260951_2.rank-1.sdf
9 molecules converted

docking pose

$ pwd
/home/centos/virtualflow/VFVS_GK/pp/docking_poses/qvina02_rigid_receptor1/dockingsposes.plain
$ ls
100_Z1669288933_1_T1.pdb     32_Z2700583334_1.pdb         55_Z1175719058_2.pdb         78_Z2046069599_1.pdb
10_Z2624037004_1.pdb         33_Z2638723223_1.pdb         56_Z2505285340_1_T1.pdb      79_Z1668414848_2_T1.pdb
11_Z2211137992_4.pdb         34_Z2230216305_1_T1.pdb      57_Z2378042591_6_T1.pdb      7_Z2087256678_2.pdb
12_Z2211139111_1.pdb         35_Z1237025175_1.pdb         58_PV-001089728404_1_T1.pdb  80_PV-001873778304_2.pdb
13_Z2624037004_2.pdb         36_PV-001282503059_2.pdb     59_Z2364809982_1_T1.pdb      81_PV-001915879035_1.pdb
14_PV-001701895824_2.pdb     37_PV-001377853194_1_T1.pdb  5_Z2087260951_4.pdb          82_Z2144418621_1.pdb
15_Z2087260951_1.pdb         38_Z2364787117_1_T1.pdb      60_Z1897122191_4.pdb         83_Z2144418621_4.pdb
16_Z2700583334_2.pdb         39_PV-001288562049_2.pdb     61_PV-001826919885_7.pdb     84_Z2221447237_1.pdb
17_Z2211137992_2.pdb         3_Z2087256678_1.pdb          62_Z2717222271_4.pdb         85_Z1175719058_1.pdb
18_Z2211139111_2.pdb         40_PV-001702179999_2.pdb     63_Z2042828126_1.pdb         86_Z2723142397_2_T1.pdb
19_Z2700586182_2.pdb         41_Z2596550737_1.pdb         64_Z2144418621_2.pdb         87_Z829994926_1_T1.pdb
1_Z2624037004_4.pdb          42_Z2713537244_2.pdb         65_Z2593207602_1_T1.pdb      88_PV-000256089451_1_T1.pdb
20_PV-001958058751_2.pdb     43_Z1418769667_1.pdb         66_Z510613592_1_T2.pdb       89_PV-000979741330_1_T1.pdb
21_PV-001702179999_1.pdb     44_Z1418769667_2.pdb         67_PV-000902495780_1_T1.pdb  8_Z2087260951_3.pdb
22_PV-000380950674_1.pdb     45_PV-001288562049_1.pdb     68_PV-000902495780_2_T1.pdb  90_PV-001002400892_1_T1.pdb
23_PV-001958058751_1.pdb     46_PV-000378673869_1.pdb     69_Z2084379853_2_T1.pdb      91_PV-001378044208_2_T1.pdb
24_Z2211137992_3.pdb         47_PV-001826919885_1.pdb     6_PV-001701895824_1.pdb      92_PV-001743414951_1_T1.pdb
25_Z2717222271_1.pdb         48_PV-001826919885_2.pdb     70_Z1656518334_1.pdb         93_Z2364787117_2_T1.pdb
26_PV-000376279119_1.pdb     49_Z2211137992_1.pdb         71_Z1656518334_2.pdb         94_Z2366885184_1_T1.pdb
27_PV-000376279119_2.pdb     4_Z2087260951_2.pdb          72_Z1897122191_2.pdb         95_Z1103196794_1_T1.pdb
28_PV-001958058751_4.pdb     50_Z812712648_1.pdb          73_Z2155602585_3.pdb         96_Z1897122191_3.pdb
29_Z2700586182_1.pdb         51_Z812712648_5.pdb          74_Z2155602585_4.pdb         97_PV-001826919885_5.pdb
2_Z2624037004_3.pdb          52_Z812712648_8.pdb          75_Z2717222271_2.pdb         98_PV-001826919885_6.pdb
30_PV-000286243379_1_T1.pdb  53_Z2713204296_1.pdb         76_Z2717222271_3.pdb         99_Z812712648_7.pdb
31_Z2893380031_1_T1.pdb      54_Z449211618_1.pdb          77_PV-001287209271_1.pdb     9_PV-001958058751_3.pdb

Visualization

  • connect to the cluster via NICE-DCV

    pcluster dcv connect -k <KEY_NAME> <CLUSTER_NAME>
    
  • visualize doking pose PDB files in PyMol.

Consideration

  • some docking simulation was failed
  • use Spot with Slurm SBATCH requeue option
  • adjust number of jobline
  • use FSx for Lustre

etc

count ligands

#!/bin/bash -xe                                                                                                 

input_folder=CF
temp_folder=tmp
output_filename=collections.txt

mkdir -p ${temp_folder}/${input_folder}

for metatranche in $(ls ${input_folder}); do
    for tranche in $(ls ${input_folder}/${metatranche}); do
        echo " * Extracting ${tranche} to ${temp_folder}"
        tar -xf ${tranche} -C ${temp_folder}/${input_folder} || true
        for file in $(ls ${temp_folder}/${tranche/.*}); do
            echo " * Adding file ${temp_folder}/${tranche/.*}/${file} to ${output_filename}"
            tmp_tranche=${tranche/.tar}
            count=$(tar tf ${temp_folder}/${tranche/.*}/${file} | grep .pdbqt | wc -l)
            echo "${tmp_tranche##*/}_${file/.*} ${count}" >> ${output_filename}
        done
    done
done

rm -r ${temp_folder}
@DaisukeMiyamoto
Copy link
Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment