Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save casparvl/01b3d181e7e53143584ac02e10011126 to your computer and use it in GitHub Desktop.
Save casparvl/01b3d181e7e53143584ac02e10011126 to your computer and use it in GitHub Desktop.
(partial) EasyBuild log for failed build of /scratch-node/casparl.7053181/eb-tmp/eb-2v8kq72n/files_pr20358/t/TensorFlow/TensorFlow-2.15.1-foss-2023a-CUDA-12.1.1.eb (PR(s) #20358) (easyblock PR(s) #3303)
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue2024-07-18 13:35:54.553401: E tensorflow/core/common_runtime/base_collective_executor.cc:249] BaseCollectiveExecutor::StartAbort INTERNAL: NCCL: unhandled cuda error (run with NCCL_DEBUG=INFO for details). Set NCCL_DEBUG=WARN for detail.
2024-07-18 13:35:54.553470: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at collective_ops.cc:320 : INTERNAL: Collective ops is aborted by: NCCL: unhandled cuda error (run with NCCL_DEBUG=INFO for details). Set NCCL_DEBUG=WARN for detail.
The error could be from a previous operation. Restart your program to reset. [type.googleapis.com/tensorflow.DerivedStatus='']
[ FAILED ] CollectiveOpGPUTest.testNcclStress
INFO:tensorflow:time(__main__.CollectiveOpGPUTest.testNcclStress): 1.73s
I0718 13:35:54.555011 22717969134464 test_util.py:2574] time(__main__.CollectiveOpGPUTest.testNcclStress): 1.73s
[ RUN ] CollectiveOpGPUTest.test_session
[ SKIPPED ] CollectiveOpGPUTest.test_session
======================================================================
ERROR: testNcclStress (__main__.CollectiveOpGPUTest.testNcclStress)
CollectiveOpGPUTest.testNcclStress
----------------------------------------------------------------------
Traceback (most recent call last):
File "/scratch-node/casparl.7053181/eb-build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/90dfda158e6c36a7b501f9dc86aa7413/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/collective_ops_gpu_test_gpu.runfiles/org_tensorflow/tensorflow/python/ops/collective_ops_gpu_test.py", line 279, in testNcclStress
collective_ops.all_reduce(
File "/scratch-node/casparl.7053181/eb-build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/90dfda158e6c36a7b501f9dc86aa7413/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/collective_ops_gpu_test_gpu.runfiles/org_tensorflow/tensorflow/python/ops/collective_ops.py", line 59, in all_reduce
return gen_collective_ops.collective_reduce(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch-node/casparl.7053181/eb-build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/90dfda158e6c36a7b501f9dc86aa7413/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/collective_ops_gpu_test_gpu.runfiles/org_tensorflow/tensorflow/python/ops/gen_collective_ops.py", line 998, in collective_reduce
return collective_reduce_eager_fallback(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch-node/casparl.7053181/eb-build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/90dfda158e6c36a7b501f9dc86aa7413/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/collective_ops_gpu_test_gpu.runfiles/org_tensorflow/tensorflow/python/ops/gen_collective_ops.py", line 1088, in collective_reduce_eager_fallback
_result = _execute.execute(b"CollectiveReduce", 1, inputs=_inputs_flat,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch-node/casparl.7053181/eb-build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/90dfda158e6c36a7b501f9dc86aa7413/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/collective_ops_gpu_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/execute.py", line 53, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.InternalError: {{function_node __wrapped__CollectiveReduce_device_/job:localhost/replica:0/task:0/device:GPU:0}} Collective ops is aborted by: NCCL: unhandled cuda error (run with NCCL_DEBUG=INFO for details). Set NCCL_DEBUG=WARN for detail.
The error could be from a previous operation. Restart your program to reset. [Op:CollectiveReduce]
----------------------------------------------------------------------
Ran 13 tests in 5.004s
FAILED (errors=1, skipped=12)
.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:115 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] enqueue.cc:128 NCCL WARN Cuda failure 'named symbol not found'
gcn74:1582888:1583878 [0] NCCL INFO init.cc:1332 -> 1
gcn74:1582888:1583878 [0] NCCL INFO group.cc:65 -> 1 [Async thread]
gcn74:1582888:1583871 [0] NCCL INFO group.cc:406 -> 1
gcn74:1582888:1583871 [0] NCCL INFO group.cc:96 -> 1
== 2024-07-18 15:40:55,854 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/base/exceptions.py:126 in __init__): At least 1 gpu tests failed:
//tensorflow/python/ops:collective_ops_gpu_test_gpu (at easybuild/framework/easyblock.py:2287 in report_test_failure)
== 2024-07-18 15:40:55,856 build_log.py:267 INFO ... (took 2 hours 23 mins 22 secs)
== 2024-07-18 15:40:55,856 build_log.py:267 INFO ... (took 2 hours 24 mins 38 secs)
== 2024-07-18 15:40:55,856 filetools.py:2012 INFO Removing lock /scratch-nvme/1/casparl/generic/software/.locks/_scratch-nvme_1_casparl_generic_software_TensorFlow_2.15.1-foss-2023a-CUDA-12.1.1.lock...
== 2024-07-18 15:40:55,859 filetools.py:383 INFO Path /scratch-nvme/1/casparl/generic/software/.locks/_scratch-nvme_1_casparl_generic_software_TensorFlow_2.15.1-foss-2023a-CUDA-12.1.1.lock successfully removed.
== 2024-07-18 15:40:55,859 filetools.py:2016 INFO Lock removed: /scratch-nvme/1/casparl/generic/software/.locks/_scratch-nvme_1_casparl_generic_software_TensorFlow_2.15.1-foss-2023a-CUDA-12.1.1.lock
== 2024-07-18 15:40:55,859 easyblock.py:4283 WARNING build failed (first 300 chars): At least 1 gpu tests failed:
//tensorflow/python/ops:collective_ops_gpu_test_gpu
== 2024-07-18 15:40:55,859 easyblock.py:328 INFO Closing log for application name TensorFlow version 2.15.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment