Skip to content

Instantly share code, notes, and snippets.

@gnperdue
Created January 6, 2016 22:24
Show Gist options
  • Save gnperdue/fde79523a018a78a8b32 to your computer and use it in GitHub Desktop.
Save gnperdue/fde79523a018a78a8b32 to your computer and use it in GitHub Desktop.
MNIST tutorial error with more detail
Interactive GPU job:
qsub -q gpu -l nodes=1:gpu -A minervaG -I
. caffe_gpu_setup.sh
caffe.bin device_query -gpu 0
I0106 16:20:46.109344 29397 caffe.cpp:111] Querying GPUs 0
I0106 16:20:48.749228 29397 common.cpp:168] Device id: 0
I0106 16:20:48.749281 29397 common.cpp:169] Major revision number: 3
I0106 16:20:48.749287 29397 common.cpp:170] Minor revision number: 5
I0106 16:20:48.749294 29397 common.cpp:171] Name: Tesla K20m
I0106 16:20:48.749299 29397 common.cpp:172] Total global memory: 5032706048
I0106 16:20:48.749307 29397 common.cpp:173] Total shared memory per block: 49152
I0106 16:20:48.749312 29397 common.cpp:174] Total registers per block: 65536
I0106 16:20:48.749317 29397 common.cpp:175] Warp size: 32
I0106 16:20:48.749323 29397 common.cpp:176] Maximum memory pitch: 2147483647
I0106 16:20:48.749328 29397 common.cpp:177] Maximum threads per block: 1024
I0106 16:20:48.749332 29397 common.cpp:178] Maximum dimension of block: 1024, 1024, 64
I0106 16:20:48.749341 29397 common.cpp:181] Maximum dimension of grid: 2147483647, 6553\
5, 65535
I0106 16:20:48.749346 29397 common.cpp:184] Clock rate: 705500
I0106 16:20:48.749351 29397 common.cpp:185] Total constant memory: 65536
I0106 16:20:48.749356 29397 common.cpp:186] Texture alignment: 512
I0106 16:20:48.749361 29397 common.cpp:187] Concurrent copy and execution: Yes
I0106 16:20:48.749372 29397 common.cpp:189] Number of multiprocessors: 13
I0106 16:20:48.749377 29397 common.cpp:190] Kernel execution timeout: No
First, get the data...
perdue@gpu1> pwd
/home/perdue/caffe/data/mnist
perdue@gpu1> ls -l
total 53744
-rwxr-xr-x 1 perdue e-938 788 Dec 20 17:02 get_mnist.sh
-rw-r--r-- 1 perdue e-938 7840016 Jul 21 2000 t10k-images-idx3-ubyte
-rw-r--r-- 1 perdue e-938 10008 Jul 21 2000 t10k-labels-idx1-ubyte
-rw-r--r-- 1 perdue e-938 47040016 Jul 21 2000 train-images-idx3-ubyte
-rw-r--r-- 1 perdue e-938 60008 Jul 21 2000 train-labels-idx1-ubyte
Next, prepare the data. Need to edit `get_mnist.sh` a bit to make it function.
perdue@gpu1> more examples/mnist/create_mnist.sh
#!/usr/bin/env sh
# This script converts the mnist data into lmdb/leveldb format,
# depending on the value assigned to $BACKEND.
EXAMPLE=examples/mnist
DATA=data/mnist
BUILD=build/examples/mnist
BACKEND="lmdb"
echo "Creating ${BACKEND}..."
rm -rf $EXAMPLE/mnist_train_${BACKEND}
rm -rf $EXAMPLE/mnist_test_${BACKEND}
convert_mnist_data.bin $DATA/train-images-idx3-ubyte \
$DATA/train-labels-idx1-ubyte $EXAMPLE/mnist_train_${BACKEND} --backend=${BACKEND}
convert_mnist_data.bin $DATA/t10k-images-idx3-ubyte \
$DATA/t10k-labels-idx1-ubyte $EXAMPLE/mnist_test_${BACKEND} --backend=${BACKEND}
echo "Done."
Note comment in `get_mnist.sh`:
# Creation is split out because leveldb sometimes causes segfault
# and needs to be re-created.
Go to prep area...
perdue@gpu1> pwd
/home/perdue/caffe
perdue@gpu1> ls examples/mnist/create_mnist.sh
examples/mnist/create_mnist.sh
perdue@gpu1> ./examples/mnist/create_mnist.sh
Creating lmdb...
F0106 14:51:25.869540 29018 convert_mnist_data.cpp:91] Check failed: mdb_env_open(mdb_env, \
db_path, 0, 0664) == 0 (5 vs. 0) mdb_env_open failed
*** Check failure stack trace: ***
@ 0x2b5816751b4d google::LogMessage::Fail()
@ 0x2b5816755b67 google::LogMessage::SendToLog()
@ 0x2b58167539e9 google::LogMessage::Flush()
@ 0x2b5816753ced google::LogMessageFatal::~LogMessageFatal()
@ 0x403d29 convert_dataset()
@ 0x40462c main
@ 0x2b581ed19d5d __libc_start_main
@ 0x4025a9 (unknown)
./examples/mnist/create_mnist.sh: line 17: 29018 Aborted convert_mnist_data\
.bin $DATA/train-images-idx3-ubyte $DATA/train-labels-idx1-ubyte $EXAMPLE/mnist_train_${BACKEND\
} --backend=${BACKEND}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment