Last active
December 7, 2020 00:08
-
-
Save ryogrid/09206b9472b3d159cbc98fca689a78a5 to your computer and use it in GitHub Desktop.
executed XcalableMP sample program using official docker image
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CentOS6.1なVPSにdocker環境を構築して、公式イメージを立ち上げて、サンプルプログラムを動かしてみる。 | |
[ryo@v133-18-203-78 ~]$ sudo yum install https://get.docker.com/rpm/1.7.1/centos-6/RPMS/x86_64/docker-engine-1.7.1-1.el6.x86_64.rpm | |
(略) | |
Resolving Dependencies | |
--> Running transaction check | |
---> Package docker-engine.x86_64 0:1.7.1-1.el6 will be installed | |
--> Finished Dependency Resolution | |
Dependencies Resolved | |
================================================================================================================================================ | |
Package Arch Version Repository Size | |
================================================================================================================================================ | |
Installing: | |
docker-engine x86_64 1.7.1-1.el6 /docker-engine-1.7.1-1.el6.x86_64 19 M | |
Transaction Summary | |
================================================================================================================================================ | |
Install 1 Package(s) | |
Total size: 19 M | |
Installed size: 19 M | |
Is this ok [y/N]: Y | |
Downloading Packages: | |
Running rpm_check_debug | |
Running Transaction Test | |
Transaction Test Succeeded | |
Running Transaction | |
Installing : docker-engine-1.7.1-1.el6.x86_64 1/1 | |
Verifying : docker-engine-1.7.1-1.el6.x86_64 1/1 | |
Installed: | |
docker-engine.x86_64 0:1.7.1-1.el6 | |
Complete! | |
[ryo@v133-18-203-78 ~]$ sudo sh -c "curl -L https://github.com/docker/compose/releases/download/1.5.2/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose" | |
% Total % Received % Xferd Average Speed Time Time Time Current | |
Dload Upload Total Spent Left Speed | |
100 7739k 100 7739k 0 0 2191k 0 0:00:03 0:00:03 --:--:-- 2998k | |
[ryo@v133-18-203-78 ~]$ sudo chmod +x /usr/local/bin/docker-compose | |
[ryo@v133-18-203-78 ~]$ sudo service docker start | |
Starting cgconfig service: [ OK ] | |
Starting docker: [ OK ] | |
[ryo@v133-18-203-78 ~]$ sudo docker run -it -u xmp -w /home/xmp omnicompiler/xcalablemp | |
Unable to find image 'omnicompiler/xcalablemp:latest' locally | |
latest: Pulling from omnicompiler/xcalablemp | |
1e88bef1d4a7: Pull complete | |
cb5df0650fd9: Pull complete | |
649eb51e8f47: Pull complete | |
e1fa8d0cb373: Pull complete | |
8632f6f59e32: Pull complete | |
42a8bbb955df: Pull complete | |
7908fe74e61b: Pull complete | |
cd49434f411b: Pull complete | |
be11fc5b1b86: Pull complete | |
3a29aa4ac303: Pull complete | |
8801446cbded: Pull complete | |
ebc5d0a45319: Pull complete | |
3ea1944a34dc: Pull complete | |
Digest: sha256:9cd1fa45afb9d3a50d0b7e217a597b45e1a91256b8c40dbbd141bab8383c43c0 | |
Status: Downloaded newer image for omnicompiler/xcalablemp:latest | |
以降は起動されたコンテナ内のシェル | |
xmp@88f28bc873a1:~$ ls -l | |
total 16 | |
drwxr-xr-x 2 xmp xmp 4096 Aug 27 2017 2.globalview | |
-rw------- 1 xmp xmp 1689 Sep 4 2017 2.globalview.tgz | |
drwxr-xr-x 2 xmp xmp 4096 Sep 4 2017 3.localview | |
-rw------- 1 xmp xmp 2797 Sep 4 2017 3.localview.tgz | |
xmp@88f28bc873a1:~$ cd 2.globalview | |
xmp@88f28bc873a1:~/2.globalview$ ls | |
init.c laplace.c xmp_init.c xmp_init_ans.c xmp_laplace.c xmp_laplace_ans.c | |
init.f90 laplace.f90 xmp_init.f90 xmp_init_ans.f90 xmp_laplace.f90 xmp_laplace_ans.f90 | |
xmp@88f28bc873a1:~/2.globalview$ cat laplace.c | |
#include <stdio.h> | |
#include <stdlib.h> | |
#include <math.h> | |
#define N1 64 | |
#define N2 64 | |
double u[N2][N1], uu[N2][N1]; | |
int main(int argc, char **argv) | |
{ | |
int j, i, k, niter = 100; | |
double value = 0.0; | |
for(j = 0; j < N2; j++){ | |
for(i = 0; i < N1; i++){ | |
u[j][i] = 0.0; | |
uu[j][i] = 0.0; | |
} | |
} | |
for(j = 1; j < N2-1; j++) | |
for(i = 1; i < N1-1; i++) | |
u[j][i] = sin((double)i/N1*M_PI) + cos((double)j/N2*M_PI); | |
for(k = 0; k < niter; k++){ | |
for(j = 1; j < N2-1; j++) | |
for(i = 1; i < N1-1; i++) | |
uu[j][i] = u[j][i]; | |
for(j = 1; j < N2-1; j++) | |
for(i = 1; i < N1-1; i++) | |
u[j][i] = (uu[j-1][i] + uu[j+1][i] + uu[j][i-1] + uu[j][i+1])/4.0; | |
} | |
for(j = 1; j < N2-1; j++) | |
for(i = 1; i < N1-1; i++) | |
value += fabs(uu[j][i] - u[j][i]); | |
printf("Verification = %20.16f\n", value); | |
return 0; | |
xmp@88f28bc873a1:~/2.globalview$ xmpcc -o laplace.out laplace.c | |
xmp@88f28bc873a1:~/2.globalview$ ls | |
init.c laplace.c laplace.o xmp_init.c xmp_init_ans.c xmp_laplace.c xmp_laplace_ans.c | |
init.f90 laplace.f90 laplace.out xmp_init.f90 xmp_init_ans.f90 xmp_laplace.f90 xmp_laplace_ans.f90 | |
xmp@88f28bc873a1:~/2.globalview$ time ./laplace.out | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
-------------------------------------------------------------------------- | |
[[44880,1],0]: A high-performance Open MPI point-to-point messaging module | |
was unable to find any relevant network interfaces: | |
Module: OpenFabrics (openib) | |
Host: 88f28bc873a1 | |
Another transport will be used instead, although this may result in | |
lower performance. | |
-------------------------------------------------------------------------- | |
Verification = 5.5488557664881109 | |
real 0m0.181s | |
user 0m0.026s | |
sys 0m0.023s | |
なんか動いたけど、これ、XMPのディレクティブとか書かれてないのでただ、逐次のプログラムが1プロセス動いただけ。 | |
以下はXMPで並列化してあるっぽい。 | |
xmp@88f28bc873a1:~/2.globalview$ cat xmp_laplace_ans.c | |
#include <stdio.h> | |
#include <stdlib.h> | |
#include <math.h> | |
#define N1 64 | |
#define N2 64 | |
double u[N2][N1], uu[N2][N1]; | |
#pragma xmp nodes p[*][4] | |
#pragma xmp template t[N2][N1] | |
#pragma xmp distribute t[block][block] onto p | |
#pragma xmp align u[j][i] with t[j][i] | |
#pragma xmp align uu[j][i] with t[j][i] | |
#pragma xmp shadow uu[1:1][1:1] | |
int main(int argc, char **argv) | |
{ | |
int i, j, k, niter = 100; | |
double value = 0.0; | |
#pragma xmp loop (j,i) on t[j][i] | |
for(j = 0; j < N2; j++){ | |
for(i = 0; i < N1; i++){ | |
u[j][i] = 0.0; | |
uu[j][i] = 0.0; | |
} | |
} | |
#pragma xmp loop (j,i) on t[j][i] | |
for(j = 1; j < N2-1; j++) | |
for(i = 1; i < N1-1; i++) | |
u[j][i] = sin((double)i/N1*M_PI) + cos((double)j/N2*M_PI); | |
for(k = 0; k < niter; k++){ | |
#pragma xmp loop (j,i) on t[j][i] | |
for(j = 1; j < N2-1; j++) | |
for(i = 1; i < N1-1; i++) | |
uu[j][i] = u[j][i]; | |
#pragma xmp reflect (uu) | |
#pragma xmp loop (j,i) on t[j][i] | |
for(j = 1; j < N2-1; j++) | |
for(i = 1; i < N1-1; i++) | |
u[j][i] = (uu[j-1][i] + uu[j+1][i] + uu[j][i-1] + uu[j][i+1])/4.0; | |
} | |
#pragma xmp loop (j,i) on t[j][i] reduction(+:value) | |
for(j = 1; j < N2-1; j++) | |
for(i = 1; i < N1-1; i++) | |
value += fabs(uu[j][i] - u[j][i]); | |
#pragma xmp task on p[0][0] | |
printf("Verification = %20.16f\n", value); | |
return 0; | |
} | |
並列化されているものを動かしてみよう。 | |
xmp@88f28bc873a1:~/2.globalview$ xmpcc -o xmp_laplace_ans.out xmp_laplace_ans.c | |
xmp@88f28bc873a1:~/2.globalview$ ls | |
init.c laplace.c laplace.o xmp_init.c xmp_init_ans.c xmp_laplace.c xmp_laplace_ans.c xmp_laplace_ans.o | |
init.f90 laplace.f90 laplace.out xmp_init.f90 xmp_init_ans.f90 xmp_laplace.f90 xmp_laplace_ans.f90 xmp_laplace_ans.out | |
1プロセスで動かしてみようとするがエラー | |
xmp@88f28bc873a1:~/2.globalview$ cat hostfile | |
localhost | |
xmp@88f28bc873a1:~/2.globalview$ time mpirun -np 1 --hostfile ./hostfile xmp_laplace_ans.out | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
-------------------------------------------------------------------------- | |
[[44695,1],0]: A high-performance Open MPI point-to-point messaging module | |
was unable to find any relevant network interfaces: | |
Module: OpenFabrics (openib) | |
Host: 88f28bc873a1 | |
Another transport will be used instead, although this may result in | |
lower performance. | |
-------------------------------------------------------------------------- | |
[RANK:0] XcalableMP runtime error: indicated communicator size is bigger than the actual communicator size | |
-------------------------------------------------------------------------- | |
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD | |
with errorcode 1. | |
同一ノード上で2プロセス立ち上げて動かそうとしてみるがエラー。 | |
2プロセスなのは、コンテナ内で/proc/cpuinfo を見たら2コアあるように見えていたから。 | |
xmp@88f28bc873a1:~/2.globalview$ cat hostfile | |
localhost | |
localhost | |
xmp@88f28bc873a1:~/2.globalview$ time mpirun -np 2 --hostfile ./hostfile xmp_laplace_ans.out | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
-------------------------------------------------------------------------- | |
[[44782,1],0]: A high-performance Open MPI point-to-point messaging module | |
was unable to find any relevant network interfaces: | |
Module: OpenFabrics (openib) | |
Host: 88f28bc873a1 | |
Another transport will be used instead, although this may result in | |
lower performance. | |
-------------------------------------------------------------------------- | |
[RANK:0] XcalableMP runtime error: indicated communicator size is bigger than the actual communicator size | |
[RANK:1] XcalableMP runtime error: indicated communicator size is bigger than the actual communicator size | |
-------------------------------------------------------------------------- | |
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD | |
with errorcode 1. | |
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. | |
You may or may not see output from other processes, depending on | |
exactly when Open MPI kills them. | |
-------------------------------------------------------------------------- | |
[88f28bc873a1:00837] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics | |
[88f28bc873a1:00837] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages | |
[88f28bc873a1:00837] 1 more process has sent help message help-mpi-api.txt / mpi-abort | |
real 0m0.202s | |
user 0m0.064s | |
sys 0m0.081s | |
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. | |
You may or may not see output from other processes, depending on | |
exactly when Open MPI kills them. | |
-------------------------------------------------------------------------- | |
real 0m1.199s | |
user 0m0.039s | |
sys 0m0.063s | |
同じ2ノードを列挙したホストファイルを指定して1プロセスを起動しようとしてみるがエラー。 | |
xmp@88f28bc873a1:~/2.globalview$ cat hostfile | |
localhost | |
localhost | |
xmp@88f28bc873a1:~/2.globalview$ time mpirun -np 1 --hostfile ./hostfile xmp_laplace_ans.out | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
-------------------------------------------------------------------------- | |
[[44795,1],0]: A high-performance Open MPI point-to-point messaging module | |
was unable to find any relevant network interfaces: | |
Module: OpenFabrics (openib) | |
Host: 88f28bc873a1 | |
Another transport will be used instead, although this may result in | |
lower performance. | |
-------------------------------------------------------------------------- | |
[RANK:0] XcalableMP runtime error: indicated communicator size is bigger than the actual communicator size | |
-------------------------------------------------------------------------- | |
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD | |
with errorcode 1. | |
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. | |
You may or may not see output from other processes, depending on | |
exactly when Open MPI kills them. | |
-------------------------------------------------------------------------- | |
real 0m1.180s | |
user 0m0.039s | |
sys 0m0.042s | |
試しに並列化されていないものを同じホストファイルで2プロセス立ち上げて実行してみる | |
xmp@88f28bc873a1:~/2.globalview$ cat hostfile | |
localhost | |
localhost | |
xmp@88f28bc873a1:~/2.globalview$ time mpirun -np 2 --hostfile ./hostfile laplace.out | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
-------------------------------------------------------------------------- | |
[[44787,1],0]: A high-performance Open MPI point-to-point messaging module | |
was unable to find any relevant network interfaces: | |
Module: OpenFabrics (openib) | |
Host: 88f28bc873a1 | |
Another transport will be used instead, although this may result in | |
lower performance. | |
-------------------------------------------------------------------------- | |
Verification = 5.5488557664881109 | |
Verification = 5.5488557664881109 | |
[88f28bc873a1:00856] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics | |
[88f28bc873a1:00856] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages | |
real 0m0.235s | |
user 0m0.091s | |
sys 0m0.112s | |
意味ないけど完走はした。 | |
MPIの実行環境については何もしていない(コンテナイメージのまま)なので、そこのセットアップをちゃんとやらんといかんという話か、MPIの実行の仕方を根本的に勘違いしているか(localhostだけの1ノードを指定して1プロセスだけで動かすというのは動きそうなものだが)。 | |
現場からは以上です。 | |
追記: | |
xmp_init.c は動いた。 | |
xmp@88f28bc873a1:~/2.globalview$ cat xmp_init.c | |
#include <stdio.h> | |
#include <xmp.h> | |
#pragma xmp nodes p[2] | |
#pragma xmp template t[10] | |
#pragma xmp distribute t[block] onto p | |
int a[10]; | |
int main(){ | |
for(int i=0;i<10;i++) | |
a[i] = i+1; | |
for(int i=0;i<10;i++) | |
printf("[%d] %d\n", xmpc_node_num(), a[i]); | |
return 0; | |
} | |
xmp@88f28bc873a1:~/2.globalview$ xmpcc -o xmp_init.out xmp_init.c | |
xmp@88f28bc873a1:~/2.globalview$ cat hostfile | |
localhost | |
localhost | |
xmp@88f28bc873a1:~/2.globalview$ mpirun -n 2 --hostfile ./hostfile ./xmp_init.out | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
-------------------------------------------------------------------------- | |
[[43322,1],0]: A high-performance Open MPI point-to-point messaging module | |
was unable to find any relevant network interfaces: | |
Module: OpenFabrics (openib) | |
Host: 88f28bc873a1 | |
Another transport will be used instead, although this may result in | |
lower performance. | |
-------------------------------------------------------------------------- | |
[0] 1 | |
[0] 2 | |
[0] 3 | |
[0] 4 | |
[0] 5 | |
[0] 6 | |
[0] 7 | |
[0] 8 | |
[0] 9 | |
[0] 10 | |
[1] 1 | |
[1] 2 | |
[1] 3 | |
[1] 4 | |
[1] 5 | |
[1] 6 | |
[1] 7 | |
[1] 8 | |
[1] 9 | |
[1] 10 | |
[88f28bc873a1:01169] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics | |
[88f28bc873a1:01169] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages | |
プロセス数がプログラムが想定している数と合っていない気がしたので、コードから4という数字を読み取って、その数字を指定してみるとエラーは出ずに実行が始まった! | |
xmp@88f28bc873a1:~/2.globalview$ mpirun -n 4 ./xmp_laplace_ans.out | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
-------------------------------------------------------------------------- | |
[[43036,1],0]: A high-performance Open MPI point-to-point messaging module | |
was unable to find any relevant network interfaces: | |
Module: OpenFabrics (openib) | |
Host: 88f28bc873a1 | |
Another transport will be used instead, although this may result in | |
lower performance. | |
-------------------------------------------------------------------------- | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. | |
[88f28bc873a1:01463] 3 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics | |
[88f28bc873a1:01463] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages | |
(ここで計算をしているのかすごく時間がかかる -> 2-3時間待ったが、逐次だと一秒とかからずに終了するはずの計算が終わらないので終了させた。 | |
Dockerコンテナのネットワーク回りってなんか特殊だった気がするのでそのせいかもしれない。ただ、今回のmpirunのパラメータ指定 | |
IPCするだけな気もするので良く分からない) | |
別のSSHセッションから実行中のプロセスを見てみると、4プロセス動いている。 | |
[ryo@v133-18-203-78 ~]$ top | |
top - 12:51:03 up 403 days, 3:31, 2 users, load average: 4.11, 3.35, 1.74 | |
Tasks: 173 total, 5 running, 168 sleeping, 0 stopped, 0 zombie | |
Cpu(s): 35.6%us, 64.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st | |
Mem: 2053904k total, 1532716k used, 521188k free, 243992k buffers | |
Swap: 2097148k total, 9240k used, 2087908k free, 889612k cached | |
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND | |
15111 1000 20 0 495m 8756 5708 R 50.1 0.4 4:22.32 xmp_laplace_ans | |
15110 1000 20 0 495m 8760 5704 R 49.8 0.4 4:22.61 xmp_laplace_ans | |
15109 1000 20 0 495m 10m 5700 R 49.4 0.5 4:22.73 xmp_laplace_ans | |
15113 1000 20 0 468m 8668 5692 R 49.4 0.4 4:22.41 xmp_laplace_ans | |
追記2: | |
WSL1 (Windows 10 組み込みの、Linuxを動作させる仕組み) の Ubuntu 18.04環境で xmp_laplace_ans.c と | |
同じ内容と思われるプログラムを2プロセス並列(単ノード)で動作させることはできました。 | |
その時のトライ&エラーの記録は以下のtweetのスレッドをご参照下さい。 | |
https://twitter.com/ryo_grid/status/1335395232284741634 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment