Skip to content

Instantly share code, notes, and snippets.

@sam186
sam186 / stacktrace
Last active October 11, 2017 18:34
pytorch dataparallel hang
#0 0x00007ffff76c1827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0,
futex_word=0x7fff04000c10) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 do_futex_wait (sem=sem@entry=0x7fff04000c10, abstime=0x0) at sem_waitcommon.c:111
#2 0x00007ffff76c18d4 in __new_sem_wait_slow (sem=0x7fff04000c10, abstime=0x0) at sem_waitcommon.c:181
#3 0x00007ffff76c197a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
#4 0x00007ffff7a61b33 in PyThread_acquire_lock_timed (lock=0x7fff04000c10, microseconds=-1000000, intr_flag=1)
at Python/thread_pthread.h:354
#5 0x00007ffff7a68804 in acquire_timed (lock=0x7fff04000c10, timeout=-1000000000) at ./Modules/_threadmodule.c:68
#6 0x00007ffff7a68946 in lock_PyThread_acquire_lock (self=0x7ffff6456418, args=<optimized out>,
kwds=<optimized out>) at ./Modules/_threadmodule.c:151
@sam186
sam186 / mnist.py
Created September 29, 2017 18:49
nn.DataParallel(model).cuda() hang
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable