Skip to content

Instantly share code, notes, and snippets.

@ryogrid
Last active December 7, 2020 00:08
Show Gist options
  • Save ryogrid/09206b9472b3d159cbc98fca689a78a5 to your computer and use it in GitHub Desktop.
Save ryogrid/09206b9472b3d159cbc98fca689a78a5 to your computer and use it in GitHub Desktop.
executed XcalableMP sample program using official docker image
CentOS6.1なVPSにdocker環境を構築して、公式イメージを立ち上げて、サンプルプログラムを動かしてみる。
[ryo@v133-18-203-78 ~]$ sudo yum install https://get.docker.com/rpm/1.7.1/centos-6/RPMS/x86_64/docker-engine-1.7.1-1.el6.x86_64.rpm
(略)
Resolving Dependencies
--> Running transaction check
---> Package docker-engine.x86_64 0:1.7.1-1.el6 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================================================================================
Package Arch Version Repository Size
================================================================================================================================================
Installing:
docker-engine x86_64 1.7.1-1.el6 /docker-engine-1.7.1-1.el6.x86_64 19 M
Transaction Summary
================================================================================================================================================
Install 1 Package(s)
Total size: 19 M
Installed size: 19 M
Is this ok [y/N]: Y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : docker-engine-1.7.1-1.el6.x86_64 1/1
Verifying : docker-engine-1.7.1-1.el6.x86_64 1/1
Installed:
docker-engine.x86_64 0:1.7.1-1.el6
Complete!
[ryo@v133-18-203-78 ~]$ sudo sh -c "curl -L https://github.com/docker/compose/releases/download/1.5.2/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7739k 100 7739k 0 0 2191k 0 0:00:03 0:00:03 --:--:-- 2998k
[ryo@v133-18-203-78 ~]$ sudo chmod +x /usr/local/bin/docker-compose
[ryo@v133-18-203-78 ~]$ sudo service docker start
Starting cgconfig service: [ OK ]
Starting docker: [ OK ]
[ryo@v133-18-203-78 ~]$ sudo docker run -it -u xmp -w /home/xmp omnicompiler/xcalablemp
Unable to find image 'omnicompiler/xcalablemp:latest' locally
latest: Pulling from omnicompiler/xcalablemp
1e88bef1d4a7: Pull complete
cb5df0650fd9: Pull complete
649eb51e8f47: Pull complete
e1fa8d0cb373: Pull complete
8632f6f59e32: Pull complete
42a8bbb955df: Pull complete
7908fe74e61b: Pull complete
cd49434f411b: Pull complete
be11fc5b1b86: Pull complete
3a29aa4ac303: Pull complete
8801446cbded: Pull complete
ebc5d0a45319: Pull complete
3ea1944a34dc: Pull complete
Digest: sha256:9cd1fa45afb9d3a50d0b7e217a597b45e1a91256b8c40dbbd141bab8383c43c0
Status: Downloaded newer image for omnicompiler/xcalablemp:latest
以降は起動されたコンテナ内のシェル
xmp@88f28bc873a1:~$ ls -l
total 16
drwxr-xr-x 2 xmp xmp 4096 Aug 27 2017 2.globalview
-rw------- 1 xmp xmp 1689 Sep 4 2017 2.globalview.tgz
drwxr-xr-x 2 xmp xmp 4096 Sep 4 2017 3.localview
-rw------- 1 xmp xmp 2797 Sep 4 2017 3.localview.tgz
xmp@88f28bc873a1:~$ cd 2.globalview
xmp@88f28bc873a1:~/2.globalview$ ls
init.c laplace.c xmp_init.c xmp_init_ans.c xmp_laplace.c xmp_laplace_ans.c
init.f90 laplace.f90 xmp_init.f90 xmp_init_ans.f90 xmp_laplace.f90 xmp_laplace_ans.f90
xmp@88f28bc873a1:~/2.globalview$ cat laplace.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define N1 64
#define N2 64
double u[N2][N1], uu[N2][N1];
int main(int argc, char **argv)
{
int j, i, k, niter = 100;
double value = 0.0;
for(j = 0; j < N2; j++){
for(i = 0; i < N1; i++){
u[j][i] = 0.0;
uu[j][i] = 0.0;
}
}
for(j = 1; j < N2-1; j++)
for(i = 1; i < N1-1; i++)
u[j][i] = sin((double)i/N1*M_PI) + cos((double)j/N2*M_PI);
for(k = 0; k < niter; k++){
for(j = 1; j < N2-1; j++)
for(i = 1; i < N1-1; i++)
uu[j][i] = u[j][i];
for(j = 1; j < N2-1; j++)
for(i = 1; i < N1-1; i++)
u[j][i] = (uu[j-1][i] + uu[j+1][i] + uu[j][i-1] + uu[j][i+1])/4.0;
}
for(j = 1; j < N2-1; j++)
for(i = 1; i < N1-1; i++)
value += fabs(uu[j][i] - u[j][i]);
printf("Verification = %20.16f\n", value);
return 0;
xmp@88f28bc873a1:~/2.globalview$ xmpcc -o laplace.out laplace.c
xmp@88f28bc873a1:~/2.globalview$ ls
init.c laplace.c laplace.o xmp_init.c xmp_init_ans.c xmp_laplace.c xmp_laplace_ans.c
init.f90 laplace.f90 laplace.out xmp_init.f90 xmp_init_ans.f90 xmp_laplace.f90 xmp_laplace_ans.f90
xmp@88f28bc873a1:~/2.globalview$ time ./laplace.out
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
--------------------------------------------------------------------------
[[44880,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: 88f28bc873a1
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
Verification = 5.5488557664881109
real 0m0.181s
user 0m0.026s
sys 0m0.023s
なんか動いたけど、これ、XMPのディレクティブとか書かれてないのでただ、逐次のプログラムが1プロセス動いただけ。
以下はXMPで並列化してあるっぽい。
xmp@88f28bc873a1:~/2.globalview$ cat xmp_laplace_ans.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define N1 64
#define N2 64
double u[N2][N1], uu[N2][N1];
#pragma xmp nodes p[*][4]
#pragma xmp template t[N2][N1]
#pragma xmp distribute t[block][block] onto p
#pragma xmp align u[j][i] with t[j][i]
#pragma xmp align uu[j][i] with t[j][i]
#pragma xmp shadow uu[1:1][1:1]
int main(int argc, char **argv)
{
int i, j, k, niter = 100;
double value = 0.0;
#pragma xmp loop (j,i) on t[j][i]
for(j = 0; j < N2; j++){
for(i = 0; i < N1; i++){
u[j][i] = 0.0;
uu[j][i] = 0.0;
}
}
#pragma xmp loop (j,i) on t[j][i]
for(j = 1; j < N2-1; j++)
for(i = 1; i < N1-1; i++)
u[j][i] = sin((double)i/N1*M_PI) + cos((double)j/N2*M_PI);
for(k = 0; k < niter; k++){
#pragma xmp loop (j,i) on t[j][i]
for(j = 1; j < N2-1; j++)
for(i = 1; i < N1-1; i++)
uu[j][i] = u[j][i];
#pragma xmp reflect (uu)
#pragma xmp loop (j,i) on t[j][i]
for(j = 1; j < N2-1; j++)
for(i = 1; i < N1-1; i++)
u[j][i] = (uu[j-1][i] + uu[j+1][i] + uu[j][i-1] + uu[j][i+1])/4.0;
}
#pragma xmp loop (j,i) on t[j][i] reduction(+:value)
for(j = 1; j < N2-1; j++)
for(i = 1; i < N1-1; i++)
value += fabs(uu[j][i] - u[j][i]);
#pragma xmp task on p[0][0]
printf("Verification = %20.16f\n", value);
return 0;
}
並列化されているものを動かしてみよう。
xmp@88f28bc873a1:~/2.globalview$ xmpcc -o xmp_laplace_ans.out xmp_laplace_ans.c
xmp@88f28bc873a1:~/2.globalview$ ls
init.c laplace.c laplace.o xmp_init.c xmp_init_ans.c xmp_laplace.c xmp_laplace_ans.c xmp_laplace_ans.o
init.f90 laplace.f90 laplace.out xmp_init.f90 xmp_init_ans.f90 xmp_laplace.f90 xmp_laplace_ans.f90 xmp_laplace_ans.out
1プロセスで動かしてみようとするがエラー
xmp@88f28bc873a1:~/2.globalview$ cat hostfile
localhost
xmp@88f28bc873a1:~/2.globalview$ time mpirun -np 1 --hostfile ./hostfile xmp_laplace_ans.out
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
--------------------------------------------------------------------------
[[44695,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: 88f28bc873a1
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
[RANK:0] XcalableMP runtime error: indicated communicator size is bigger than the actual communicator size
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
同一ノード上で2プロセス立ち上げて動かそうとしてみるがエラー。
2プロセスなのは、コンテナ内で/proc/cpuinfo を見たら2コアあるように見えていたから。
xmp@88f28bc873a1:~/2.globalview$ cat hostfile
localhost
localhost
xmp@88f28bc873a1:~/2.globalview$ time mpirun -np 2 --hostfile ./hostfile xmp_laplace_ans.out
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
--------------------------------------------------------------------------
[[44782,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: 88f28bc873a1
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
[RANK:0] XcalableMP runtime error: indicated communicator size is bigger than the actual communicator size
[RANK:1] XcalableMP runtime error: indicated communicator size is bigger than the actual communicator size
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[88f28bc873a1:00837] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics
[88f28bc873a1:00837] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[88f28bc873a1:00837] 1 more process has sent help message help-mpi-api.txt / mpi-abort
real 0m0.202s
user 0m0.064s
sys 0m0.081s
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
real 0m1.199s
user 0m0.039s
sys 0m0.063s
同じ2ノードを列挙したホストファイルを指定して1プロセスを起動しようとしてみるがエラー。
xmp@88f28bc873a1:~/2.globalview$ cat hostfile
localhost
localhost
xmp@88f28bc873a1:~/2.globalview$ time mpirun -np 1 --hostfile ./hostfile xmp_laplace_ans.out
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
--------------------------------------------------------------------------
[[44795,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: 88f28bc873a1
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
[RANK:0] XcalableMP runtime error: indicated communicator size is bigger than the actual communicator size
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
real 0m1.180s
user 0m0.039s
sys 0m0.042s
試しに並列化されていないものを同じホストファイルで2プロセス立ち上げて実行してみる
xmp@88f28bc873a1:~/2.globalview$ cat hostfile
localhost
localhost
xmp@88f28bc873a1:~/2.globalview$ time mpirun -np 2 --hostfile ./hostfile laplace.out
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
--------------------------------------------------------------------------
[[44787,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: 88f28bc873a1
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
Verification = 5.5488557664881109
Verification = 5.5488557664881109
[88f28bc873a1:00856] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics
[88f28bc873a1:00856] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
real 0m0.235s
user 0m0.091s
sys 0m0.112s
意味ないけど完走はした。
MPIの実行環境については何もしていない(コンテナイメージのまま)なので、そこのセットアップをちゃんとやらんといかんという話か、MPIの実行の仕方を根本的に勘違いしているか(localhostだけの1ノードを指定して1プロセスだけで動かすというのは動きそうなものだが)。
現場からは以上です。
追記:
xmp_init.c は動いた。
xmp@88f28bc873a1:~/2.globalview$ cat xmp_init.c
#include <stdio.h>
#include <xmp.h>
#pragma xmp nodes p[2]
#pragma xmp template t[10]
#pragma xmp distribute t[block] onto p
int a[10];
int main(){
for(int i=0;i<10;i++)
a[i] = i+1;
for(int i=0;i<10;i++)
printf("[%d] %d\n", xmpc_node_num(), a[i]);
return 0;
}
xmp@88f28bc873a1:~/2.globalview$ xmpcc -o xmp_init.out xmp_init.c
xmp@88f28bc873a1:~/2.globalview$ cat hostfile
localhost
localhost
xmp@88f28bc873a1:~/2.globalview$ mpirun -n 2 --hostfile ./hostfile ./xmp_init.out
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
--------------------------------------------------------------------------
[[43322,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: 88f28bc873a1
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
[0] 1
[0] 2
[0] 3
[0] 4
[0] 5
[0] 6
[0] 7
[0] 8
[0] 9
[0] 10
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[88f28bc873a1:01169] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics
[88f28bc873a1:01169] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
プロセス数がプログラムが想定している数と合っていない気がしたので、コードから4という数字を読み取って、その数字を指定してみるとエラーは出ずに実行が始まった!
xmp@88f28bc873a1:~/2.globalview$ mpirun -n 4 ./xmp_laplace_ans.out
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
--------------------------------------------------------------------------
[[43036,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: 88f28bc873a1
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
[88f28bc873a1:01463] 3 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[88f28bc873a1:01463] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
(ここで計算をしているのかすごく時間がかかる -> 2-3時間待ったが、逐次だと一秒とかからずに終了するはずの計算が終わらないので終了させた。
 Dockerコンテナのネットワーク回りってなんか特殊だった気がするのでそのせいかもしれない。ただ、今回のmpirunのパラメータ指定
IPCするだけな気もするので良く分からない)
別のSSHセッションから実行中のプロセスを見てみると、4プロセス動いている。
[ryo@v133-18-203-78 ~]$ top
top - 12:51:03 up 403 days, 3:31, 2 users, load average: 4.11, 3.35, 1.74
Tasks: 173 total, 5 running, 168 sleeping, 0 stopped, 0 zombie
Cpu(s): 35.6%us, 64.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 2053904k total, 1532716k used, 521188k free, 243992k buffers
Swap: 2097148k total, 9240k used, 2087908k free, 889612k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15111 1000 20 0 495m 8756 5708 R 50.1 0.4 4:22.32 xmp_laplace_ans
15110 1000 20 0 495m 8760 5704 R 49.8 0.4 4:22.61 xmp_laplace_ans
15109 1000 20 0 495m 10m 5700 R 49.4 0.5 4:22.73 xmp_laplace_ans
15113 1000 20 0 468m 8668 5692 R 49.4 0.4 4:22.41 xmp_laplace_ans
追記2:
WSL1 (Windows 10 組み込みの、Linuxを動作させる仕組み) の Ubuntu 18.04環境で xmp_laplace_ans.c と
同じ内容と思われるプログラムを2プロセス並列(単ノード)で動作させることはできました。
その時のトライ&エラーの記録は以下のtweetのスレッドをご参照下さい。
https://twitter.com/ryo_grid/status/1335395232284741634
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment