This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Generate ssh key | |
ssh-keygen -t rsa -b 4096 -C "youremail@something.com" | |
eval "$(ssh-agent -s)" | |
ssh-add -K ~/.ssh/id_rsa | |
cat ~/.ssh/id_rsa.pub | |
# Setup ssh key forwarding (in the client machine) | |
vim ~/.ssh/config | |
Host * | |
ForwardAgent yes |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Lowering: | |
===================== | |
// attr [R] storage_scope = "global" | |
allocate R[float32 * ((bsz*d1)*d2)] | |
produce R { | |
// attr [iter_var(blockIdx.z, , blockIdx.z)] thread_extent = bsz | |
// attr [R.local] storage_scope = "local" | |
allocate R.local[float32 * 64] | |
// attr [A.shared] storage_scope = "shared" | |
allocate A.shared[float32 * 512] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import time | |
import torch | |
import tvm | |
from tvm.contrib import dlpack | |
from tvm import te | |
def _compile_function(b0: int = 4, b1: int = 4, b2: int = 16): | |
bsz = te.var('bsz') | |
d1 = te.var('d1') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CompileTime | |
TotalSamples: 13 | |
Accumulator: 510ms604.405us | |
ValueRate: 012ms791.510us / second | |
Rate: 0.300801 / second | |
Percentiles: 1%=003ms193.068us; 5%=003ms193.068us; 10%=003ms370.462us; 20%=004ms592.736us; 50%=005ms794.690us; 80%=124ms659.441us; 90%=124ms151.474us; 95%=219ms912.882us; 99%=219ms912.882us | |
Metric: ExecuteTime | |
TotalSamples: 264 | |
Accumulator: 08s532ms717.936us | |
ValueRate: 068ms308.366us / second |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput SrcMemType DstMemType Device Context Stream Name | |
4.60628s 6.7002ms (104 1 1) (256 1 1) 40 0B 0B - - - - Tesla K80 (0) 1 7 _ZN84_GLOBAL__N__60_tmpxft_00007ca2_00000000_11_Distributions_compute_75_cpp1_ii_c3aa7ee643distribution_elementwise_grid_stride_kernelIfLi4EZZZN2at6native19uniform_kernel_cudaERNS1_14TensorIteratorEddPNS1_9GeneratorEENKUlvE_clEvENKUlvE0_clEvEUlP24curandStatePhilox4_32_10E0_ZNS_27distribution_nullary_kernelIffLi4ESB_ZZZNS2_19uniform_kernel_cudaES4_ddS6_ENKS7_clEvENKS8_clEvEUlfE_EEvS4_PNS1_13CUDAGeneratorERKT2_T3_EUlifE_EEviSt4pairImmET1_SG_ [212] | |
4.61298s 6.6982ms (104 1 1) (256 1 1) 40 0B 0B - - - - Tesla K80 (0) 1 7 _ZN84_GLOBAL__N__60_tmpxft_00007ca2_00000000_11_Distributio |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2019-09-20 23:35:07 [] Cluster configuration: {client_workers: [{10.6.32.15, n1-standard-16, europe-west4-a, xla-0ffn}, {10.6.32.27, n1-standard-16, europe-west4-a, xla-1hgb}, {10.6.32.60, n1-standard-16, europe-west4-a, xla-1jc5}, {10.6.32.109, n1-standard-16, europe-west4-a, xla-2k9d}, {10.6.32.12, n1-standard-16, europe-west4-a, xla-2wds}, {10.6.32.41, n1-standard-16, europe-west4-a, xla-38d4}, {10.6.32.118, n1-standard-16, europe-west4-a, xla-3wx4}, {10.6.32.116, n1-standard-16, europe-west4-a, xla-4zkl}, {10.6.32.52, n1-standard-16, europe-west4-a, xla-51lv}, {10.6.32.66, n1-standard-16, europe-west4-a, xla-52lk}, {10.6.32.127, n1-standard-16, europe-west4-a, xla-5317}, {10.6.32.126, n1-standard-16, europe-west4-a, xla-5520}, {10.6.47.196, n1-standard-16, europe-west4-a, xla-5vk3}, {10.6.32.10, n1-standard-16, europe-west4-a, xla-753h}, {10.6.32.51, n1-standard-16, europe-west4-a, xla-7llm}, {10.6.32.32, n1-standard-16, europe-west4-a, xla-8bsj}, {10.6.47.194, n1-standard-16, europe-west4-a, xla-9q8t}, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
beltagy@xla-group-0bcq:/usr/share/torch-xla-nightly/pytorch/xla$ conda activate pytorch-nightly | |
(pytorch-nightly) beltagy@xla-group-0bcq:/usr/share/torch-xla-nightly/pytorch/xla$ python torch_xla_py/xla_dist.py --tpu=tpu512 --conda-env=pytorch-nightly --env=ABC=1 -- python /usr/share/torch-xla-nightly/pytorch/xla/test/test_train_cifar.py | |
2019-09-20 04:23:48 [] Command to distribute: "python" "/usr/share/torch-xla-nightly/pytorch/xla/test/test_train_cifar.py" | |
2019-09-20 04:23:48 [] Cluster configuration: {client_workers: [{10.6.47.220, n1-standard-16, europe-west4-a, xla-group-0bcq}, {10.6.47.200, n1-standard-16, europe-west4-a, xla-group-0gt7}, {10.6.47.218, n1-standard-16, europe-west4-a, xla-group-1lhq}, {10.6.47.251, n1-standard-16, europe-west4-a, xla-group-1x8f}, {10.6.47.232, n1-standard-16, europe-west4-a, xla-group-25jd}, {10.6.47.245, n1-standard-16, europe-west4-a, xla-group-2802}, {10.6.47.204, n1-standard-16, europe-west4-a, xla-group-2sxg}, {10.6.47.237, n1-standard-16, europe-west4-a, xla-gro |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2019-09-07 01:17:07 [] Command to distribute: "python" "/usr/share/torch-xla-nightly/pytorch/xla/test/test_train_imagenet.py" "--fake_data" | |
2019-09-07 01:17:07 [] Cluster configuration: {client_workers: [{10.6.47.193, n1-standard-16, europe-west4-a, xla-group-050v}, {10.6.32.48, n1-standard-16, europe-west4-a, xla-group-07s7}, {10.6.32.38, n1-standard-16, europe-west4-a, xla-group-0vzx}, {10.6.32.22, n1-standard-16, europe-west4-a, xla-group-0zqc}, {10.6.32.13, n1-standard-16, europe-west4-a, xla-group-1jd9}, {10.6.32.31, n1-standard-16, europe-west4-a, xla-group-2qw4}, {10.6.32.52, n1-standard-16, europe-west4-a, xla-group-3pkr}, {10.6.32.28, n1-standard-16, europe-west4-a, xla-group-3qrr}, {10.6.32.29, n1-standard-16, europe-west4-a, xla-group-4hv6}, {10.6.32.26, n1-standard-16, europe-west4-a, xla-group-4s6r}, {10.6.32.11, n1-standard-16, europe-west4-a, xla-group-4w79}, {10.6.32.7, n1-standard-16, europe-west4-a, xla-group-5w85}, {10.6.32.35, n1-standard-16, europe-west4-a, xla-group-6757}, {10.6.32.14, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ScheduleSyncTensorsGraph] | |
TensorsGraphInfo: | |
HloModule IrToHlo.92 | |
%add_F32.12 (lhs.13: f32[], rhs.14: f32[]) -> f32[] { | |
%lhs.13 = f32[] parameter(0) | |
%rhs.14 = f32[] parameter(1) | |
ROOT %add.15 = f32[] add(f32[] %lhs.13, f32[] %rhs.14) | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2019-08-27 18:23:18.396066: E tensorflow/compiler/xla/xla_client/xla_util.cc:72] >>> Dumping Computation 0 [42/2006] | |
2019-08-27 18:23:18.396287: E tensorflow/compiler/xla/xla_client/xla_util.cc:72] HloModule SyncTensorsGraph.92 | |
2019-08-27 18:23:18.396298: E tensorflow/compiler/xla/xla_client/xla_util.cc:72] | |
2019-08-27 18:23:18.396319: E tensorflow/compiler/xla/xla_client/xla_util.cc:72] %add_F32.12 (lhs.13: f32[], rhs.14: f32[]) -> f32[] { | |
2019-08-27 18:23:18.396328: E tensorflow/compiler/xla/xla_client/xla_util.cc:72] %lhs.13 = f32[] parameter(0) | |
2019-08-27 18:23:18.396353: E tensorflow/compiler/xla/xla_client/xla_util.cc:72] %rhs.14 = f32[] parameter(1) | |
2019-08-27 18: |
NewerOlder