For P40 AI cluster, the system is allowed to send works use PBS
command only. It is similar to send a shell script to GPU server and you will get output log files.
Similar with old ai cluster, it consists of two kinds of nodes, admin node and GPU node.
P40 AI Cluster IP: 10.15.22.198 . The IP address is not static now!
- You can login admin node use:
ssh username@10.15.22.198
- Change your password
yppasswd username
- Now you have logged into admin node of the GPU server. Internet is accessible only in admin node.
Create a folder to save script and log files
mkdir pbs_tool && cd pbs_tool
You can use GPU to train your model by sending a pdb script in admin node only.
- PBS script example
#!/bin/bash
#PBS-N Example -q sist-hexm -l sist-gpu0x
echo "This is a test script"
pwd
nvidia-smi
Save this file as example.pb
Use this command to send work
qsub example.pb
Then some log files will be generated. You can check out the output in Example.o123
. .o***
stands for output.
Ref: PBS Documents
-
Run a script:
#!/bin/bash #PBS-N YourLogFileName -q sist-hexm -l sist-gpu0x echo "This is a test script" cd YourProjectFolder command you want to excute
YourLogFileName
is filename for output log and error logsist-hexm
is group namesist-gpu0x
is computer id
-
Watch Your output
- You can use
cat
orvi
to watch the output. - You can use
watch -n -1 tail -n 30 YourLogFileNmae.oxxx
to monitor your output.-n -1
means system will refresh every 0.1s-n 30
is used to assign how many lines to display.
- You can use
-
Pay attention to
YourLogFileNmae.exxx
, error and warning info will be recored here.
You should use your local machine to debug your code, use this cluster to run and collect the results. It's difficult to debug with pbs!