Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Install procedure for pyTorch on NVIDIA Jetson TX1/TX2 with JetPack <= 3.1. For JetPack 4.2 and Xavier/Nano/TX2, see https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/
#!/bin/bash
#
# UPDATE: see here for pre-built PyTorch 1.0/1.1 pip wheels
# for JetPack 4.2 and Jetson Nano/TX2/Xavier:
#
# https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/
#
# pyTorch install script for NVIDIA Jetson TX1/TX2,
# from a fresh flashing of JetPack 2.3.1 / JetPack 3.0 / JetPack 3.1
#
# for the full source, see jetson-reinforcement repo:
# https://github.com/dusty-nv/jetson-reinforcement/blob/master/CMakePreBuild.sh
#
# note: pyTorch documentation calls for use of Anaconda,
# however Anaconda isn't available for aarch64.
# Instead, we install directly from source using setup.py
sudo apt-get install python-pip
# upgrade pip
pip install -U pip
pip --version
# pip 9.0.1 from /home/ubuntu/.local/lib/python2.7/site-packages (python 2.7)
# clone pyTorch repo
git clone http://github.com/pytorch/pytorch
cd pytorch
git submodule update --init
# install prereqs
sudo pip install -U setuptools
sudo pip install -r requirements.txt
# Develop Mode:
python setup.py build_deps
sudo python setup.py develop
# Install Mode: (substitute for Develop Mode commands)
#sudo python setup.py install
# Verify CUDA (from python interactive terminal)
# import torch
# print(torch.__version__)
# print(torch.cuda.is_available())
# a = torch.cuda.FloatTensor(2)
# print(a)
# b = torch.randn(2).cuda()
# print(b)
# c = a + b
# print(c)
@sauhaardac

This comment has been minimized.

Copy link

commented Apr 16, 2017

When running the first command in Develop Mode or Install Mode I get a compilation error.
The full output of this script can be found at https://drive.google.com/file/d/0B50tKa1YUOubdWdMbE1USElwZ2s/view?usp=sharing
The error itself is shown below. Any help would be appreciated.

-- Build files have been written to: /home/ubuntu/working-directory/pytorchinstall/pytorch/torch/lib/build/THCUNN
[  1%] Building NVCC (Device) object CMakeFiles/THCUNN.dir//./THCUNN_generated_IndexLinear.cu.o
/home/ubuntu/working-directory/pytorchinstall/pytorch/torch/lib/THCUNN/IndexLinear.cu(98): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, int)"
            function "min(unsigned int, unsigned int)"
            function "min(int, unsigned int)"
            function "min(unsigned int, int)"
            function "min(long long, long long)"
            function "min(unsigned long long, unsigned long long)"
            function "min(long long, unsigned long long)"
            function "min(unsigned long long, long long)"
            function "min(float, float)"
            function "min(double, double)"
            function "min(float, double)"
            function "min(double, float)"
            argument types are: (long, const long)

/home/ubuntu/working-directory/pytorchinstall/pytorch/torch/lib/THCUNN/IndexLinear.cu(210): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, int)"
            function "min(unsigned int, unsigned int)"
            function "min(int, unsigned int)"
            function "min(unsigned int, int)"
            function "min(long long, long long)"
            function "min(unsigned long long, unsigned long long)"
            function "min(long long, unsigned long long)"
            function "min(unsigned long long, long long)"
            function "min(float, float)"
            function "min(double, double)"
            function "min(float, double)"
            function "min(double, float)"
            argument types are: (const long, long)

2 errors detected in the compilation of "/tmp/tmpxft_000010dd_00000000-7_IndexLinear.cpp1.ii".
CMake Error at THCUNN_generated_IndexLinear.cu.o.cmake:267 (message):
  Error generating file
  /home/ubuntu/working-directory/pytorchinstall/pytorch/torch/lib/build/THCUNN/CMakeFiles/THCUNN.dir//./THCUNN_generated_IndexLinear.cu.o


make[2]: *** [CMakeFiles/THCUNN.dir/./THCUNN_generated_IndexLinear.cu.o] Error 1
make[1]: *** [CMakeFiles/THCUNN.dir/all] Error 2
make: *** [all] Error 2
@sauhaardac

This comment has been minimized.

Copy link

commented Apr 17, 2017

I saw that the nightly pytorch source apparently didn't pass build on Linux GPU with Python 2.7. I tried downloading the most recent stable build of PyTorch at https://github.com/pytorch/pytorch/releases/tag/v0.1.11. I tried executing your script after this inside this stable copy of the source. The error from above went away, but I received a different error later on in the build.

[100%] Building NVCC (Device) object CMakeFiles/THCUNN.dir//./THCUNN_generated_SpatialSubSampling.cu.o
Scanning dependencies of target THCUNN
Linking CXX shared library libTHCUNN.so
[100%] Built target THCUNN
Install the project...
-- Install configuration: "Release"
-- Installing: /home/ubuntu/Downloads/pytorch/torch/lib/tmp_install/lib/libTHCUNN.so.1
-- Installing: /home/ubuntu/Downloads/pytorch/torch/lib/tmp_install/lib/libTHCUNN.so
-- Removed runtime path from "/home/ubuntu/Downloads/pytorch/torch/lib/tmp_install/lib/libTHCUNN.so.1"
-- Installing: /home/ubuntu/Downloads/pytorch/torch/lib/tmp_install/include/THCUNN/THCUNN.h
-- Installing: /home/ubuntu/Downloads/pytorch/torch/lib/tmp_install/include/THCUNN/generic/THCUNN.h
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found CUDA: /usr/local/cuda (found suitable version "7.0", minimum required is "7.0") 
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ubuntu/Downloads/pytorch/torch/lib/build/nccl
Scanning dependencies of target nccl
[100%] Generating lib/libnccl.so
ls: cannot access /usr/local/cuda/lib64/libcudart.so.*: No such file or directory
ls: cannot access /usr/local/cuda/lib64/libcudart.so.*: No such file or directory
Grabbing  src/nccl.h                > /home/ubuntu/Downloads/pytorch/torch/lib/build/nccl/include/nccl.h
Compiling src/libwrap.cu            > /home/ubuntu/Downloads/pytorch/torch/lib/build/nccl/obj/libwrap.o
Compiling src/core.cu               > /home/ubuntu/Downloads/pytorch/torch/lib/build/nccl/obj/core.o
Compiling src/all_gather.cu         > /home/ubuntu/Downloads/pytorch/torch/lib/build/nccl/obj/all_gather.o
Compiling src/all_reduce.cu         > /home/ubuntu/Downloads/pytorch/torch/lib/build/nccl/obj/all_reduce.o
src/core.cu(724): error: expected an expression

src/core.cu(724): error: expected an expression

2 errors detected in the compilation of "/tmp/tmpxft_0000284b_00000000-16_core.compute_52.cpp1.ii".
make[3]: *** [/home/ubuntu/Downloads/pytorch/torch/lib/build/nccl/obj/core.o] Error 2
make[3]: *** Waiting for unfinished jobs....
ptxas warning : Too big maxrregcount value specified 96, will be ignored
ptxas warning : Too big maxrregcount value specified 96, will be ignored
ptxas warning : Too big maxrregcount value specified 96, will be ignored
make[2]: *** [lib/libnccl.so] Error 2
make[1]: *** [CMakeFiles/nccl.dir/all] Error 2
make: *** [all] Error 2

@soumith

This comment has been minimized.

Copy link

commented Apr 19, 2017

@sauhaardac please see if this is fixed with the latest master, i've added some fixes.

@sauhaardac

This comment has been minimized.

Copy link

commented Apr 19, 2017

@soumith Thank you! It works perfectly now.

@madratman

This comment has been minimized.

Copy link

commented Jun 12, 2017

Not able to get this working on TX-1 with Ubuntu 14.04 (I tried checking out April 18th and 19th commits).
Works fine with TX-2 though.
Any ideas? Change compiler to what?

/home/ubuntu/deep_libs/pytorch/torch/lib/THCS/generic/THCSTensorMath.cu(432): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, int)"
            function "min(unsigned int, unsigned int)"
            function "min(int, unsigned int)"
            function "min(unsigned int, int)"
            function "min(long long, long long)"
            function "min(unsigned long long, unsigned long long)"
            function "min(long long, unsigned long long)"
            function "min(unsigned long long, long long)"
            function "min(float, float)"
            function "min(double, double)"
            function "min(float, double)"
            function "min(double, float)"
            argument types are: (long, long)

/home/ubuntu/deep_libs/pytorch/torch/lib/THCS/generic/THCSTensorMath.cu(432): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, int)"
            function "min(unsigned int, unsigned int)"
            function "min(int, unsigned int)"
            function "min(unsigned int, int)"
            function "min(long long, long long)"
            function "min(unsigned long long, unsigned long long)"
            function "min(long long, unsigned long long)"
            function "min(unsigned long long, long long)"
            function "min(float, float)"
            function "min(double, double)"
            function "min(float, double)"
            function "min(double, float)"
            argument types are: (long, long)

/home/ubuntu/deep_libs/pytorch/torch/lib/THCS/generic/THCSTensorMath.cu(432): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, int)"
            function "min(unsigned int, unsigned int)"
            function "min(int, unsigned int)"
            function "min(unsigned int, int)"
            function "min(long long, long long)"
            function "min(unsigned long long, unsigned long long)"
            function "min(long long, unsigned long long)"
            function "min(unsigned long long, long long)"
            function "min(float, float)"
            function "min(double, double)"
            function "min(float, double)"
            function "min(double, float)"
            argument types are: (long, long)

/home/ubuntu/deep_libs/pytorch/torch/lib/THCS/generic/THCSTensorMath.cu(432): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, int)"
            function "min(unsigned int, unsigned int)"
            function "min(int, unsigned int)"
            function "min(unsigned int, int)"
            function "min(long long, long long)"
            function "min(unsigned long long, unsigned long long)"
            function "min(long long, unsigned long long)"
            function "min(unsigned long long, long long)"
            function "min(float, float)"
            function "min(double, double)"
            function "min(float, double)"
            function "min(double, float)"
            argument types are: (long, long)

/home/ubuntu/deep_libs/pytorch/torch/lib/THCS/generic/THCSTensorMath.cu(432): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, int)"
            function "min(unsigned int, unsigned int)"
            function "min(int, unsigned int)"
            function "min(unsigned int, int)"
            function "min(long long, long long)"
            function "min(unsigned long long, unsigned long long)"
            function "min(long long, unsigned long long)"
            function "min(unsigned long long, long long)"
            function "min(float, float)"
            function "min(double, double)"
            function "min(float, double)"
            function "min(double, float)"
            argument types are: (long, long)

/home/ubuntu/deep_libs/pytorch/torch/lib/THCS/generic/THCSTensorMath.cu(432): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, int)"
            function "min(unsigned int, unsigned int)"
            function "min(int, unsigned int)"
            function "min(unsigned int, int)"
            function "min(long long, long long)"
            function "min(unsigned long long, unsigned long long)"
            function "min(long long, unsigned long long)"
            function "min(unsigned long long, long long)"
            function "min(float, float)"
            function "min(double, double)"
            function "min(float, double)"
            function "min(double, float)"
            argument types are: (long, long)

/home/ubuntu/deep_libs/pytorch/torch/lib/THCS/generic/THCSTensorMath.cu(432): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, int)"
            function "min(unsigned int, unsigned int)"
            function "min(int, unsigned int)"
            function "min(unsigned int, int)"
            function "min(long long, long long)"
            function "min(unsigned long long, unsigned long long)"
            function "min(long long, unsigned long long)"
            function "min(unsigned long long, long long)"
            function "min(float, float)"
            function "min(double, double)"
            function "min(float, double)"
            function "min(double, float)"
            argument types are: (long, long)

/home/ubuntu/deep_libs/pytorch/torch/lib/THCS/generic/THCSTensorMath.cu(432): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, int)"
            function "min(unsigned int, unsigned int)"
            function "min(int, unsigned int)"
            function "min(unsigned int, int)"
            function "min(long long, long long)"
            function "min(unsigned long long, unsigned long long)"
            function "min(long long, unsigned long long)"
            function "min(unsigned long long, long long)"
            function "min(float, float)"
            function "min(double, double)"
            function "min(float, double)"
            function "min(double, float)"
            argument types are: (long, long)

8 errors detected in the compilation of "/tmp/tmpxft_000037d7_00000000-7_THCSTensor.cpp1.ii".
CMake Error at THCS_generated_THCSTensor.cu.o.cmake:267 (message):
  Error generating file
  /home/ubuntu/deep_libs/pytorch/torch/lib/build/THCS/CMakeFiles/THCS.dir//./THCS_generated_THCSTensor.cu.o


make[2]: *** [CMakeFiles/THCS.dir/./THCS_generated_THCSTensor.cu.o] Error 1
make[1]: *** [CMakeFiles/THCS.dir/all] Error 2
make: *** [all] Error 2
@dimatura

This comment has been minimized.

Copy link

commented Jun 21, 2017

For what it's worth on my TX1 with 64-bit 14.04, I had the same errors. Adding casts to long long here and there got things compiling.

@dimatura

This comment has been minimized.

Copy link

commented Jul 14, 2017

For anyone getting here via google, I think these issues arise from using the 14.04 Tegra images that have a 32-bit compiler toolchain (but 64-bit kernel). Even when I fixed these errors I ended up getting other errors. Using the newer 16.04 images, things compile and (mostly) work.

@saharudra

This comment has been minimized.

Copy link

commented Aug 29, 2017

I followed the gist and it did not error out anywhere but the following action
failed.

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "torch/__init__.py", line 53, in <module>
    from torch._C import *
ImportError: No module named _C
@tapishr

This comment has been minimized.

Copy link

commented Sep 12, 2017

Followed this gist to install on Jetson TX1 with Ubuntu 16.04.3
Installation went smoothly. No errors.

Tried to import torch, got the following error -
ImportError: /usr/local/lib/python2.7/site-packages/torch/_C.so: undefined symbol: ZN3MPI8Datatype4FreeEv

Used this workaround -
>> LD_PRELOAD=/usr/lib/libmpi_cxx.so python

Still couldn't import. The error is now -
ImportError: /usr/local/lib/python2.7/site-packages/torch/_C.so: undefined symbol: _ZN4thpp8THTensorIsEC1EP13THShortTensor

@dusty-nv

This comment has been minimized.

Copy link
Owner Author

commented Sep 14, 2017

Updated the gist to include git submodules that were added recently to pyTorch master.

If you still have trouble, see here for the full operational script as part of jetson-reinforcement repo.

@craic

This comment has been minimized.

Copy link

commented Sep 22, 2017

This worked just fine for me
Jetson TX1
Jetpack L4T_3.1
Ubuntu 16.04
Python 2.7
..and I followed the develop mode steps

Thanks for posting this!

@craic

This comment has been minimized.

Copy link

commented Sep 22, 2017

And here is what I did to install torchvision once I had torch installed

sudo apt-get install libjpeg-dev
sudo apt-get install zlib1g-dev
sudo apt-get install libpng-dev
sudo apt-get install python-matplotlib
sudo pip install Pillow
sudo pip install torchvision

To test it I followed this tutorial
http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
... you need to add this line at the end to see the images at the end of step 1
plt.show()

@mishraanup

This comment has been minimized.

Copy link

commented Sep 25, 2017

Thanks @dusty-nv and @soumith!

@JunhongXu

This comment has been minimized.

Copy link

commented Oct 18, 2017

I am installing on Jetson TX1 with Ubuntu 16.04 and Jetpack 2.4.

Everything works fine before sudo python setup.py develop. When I run this command, I got CuDNN version is 5 not 6 error, and I set WITH_CUDNN=False in the setup.py. Then it installed without error message.

However, after installing, when I run
import torch, I got this error message:

import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/ubuntu/storage/pytorch/torch/__init__.py", line 53, in <module>
    from torch._C import *
ImportError: /media/ubuntu/storage/pytorch/torch/_C.so: undefined symbol: _ZN4gloo13EnforceNotMetC1EPKciS2_RKSs

I am importing pytorch outside of pytorch directory.

----------------------Edit-------------------------------------

It seems that the distributed package added in pytorch can not be found in _C.so. I further set
WITH_DISTRIBUTED_MW =False WITH_DISTRIBUTED = False and then I can successfully import torch without error.

Is there any way using the distributed package in pytorch in Jetson TX1? Or are there more packages I need to install?

@dthboyd

This comment has been minimized.

Copy link

commented Nov 13, 2017

@sauhaardac please share

@YogeshShitole

This comment has been minimized.

Copy link

commented Dec 1, 2017

when running pytorch_jetson_install.sh on Jetson TX2 with ubuntu 16.04 in develop mode gives below error can someone help on this..?
error: [Errno 2] No such file or directory: '/home/ubuntu/pytorch/torch/lib/tmp_install/THD_deps.txt'

and when I run in install mode it throws below error
torch/lib/build_libs.sh: line 124: cmake: command not found

I am also getting the warning message as
The directory '/home/ubuntu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag

@blink1747

This comment has been minimized.

Copy link

commented Dec 13, 2017

@YogeshShitole,
There was an error with line#29: python setup.py build_deps within "pytorch_jetson_install.sh" which is responsible for your error message.

@blink1747

This comment has been minimized.

Copy link

commented Dec 13, 2017

@YogeshShitole,
By trying a couple of codes below, I was able to fix the issue with cmake.
sudo add-apt-repository ppa:george-edison55/cmake-3.x
sudo apt-get update

@blink1747

This comment has been minimized.

Copy link

commented Dec 13, 2017

Has anyone solved the following issue:
`/home/ubuntu/pytorch/aten/src/ATen/cudnn/cudnn-wrapper.h:10:2: error: #error "CuDNN version not supported"
#error "CuDNN version not supported"
^
CMake Error at ATen_generated_NativeFunctionsCuda.cu.o.cmake:207 (message):
Error generating
/home/ubuntu/pytorch/torch/lib/build/aten/src/ATen/CMakeFiles/ATen.dir/native/cuda/./ATen_generated_NativeFunctionsCuda.cu.o

src/ATen/CMakeFiles/ATen.dir/build.make:71019: recipe for target 'src/ATen/CMakeFiles/ATen.dir/native/cuda/ATen_generated_NativeFunctionsCuda.cu.o' failed
make[2]: *** [src/ATen/CMakeFiles/ATen.dir/native/cuda/ATen_generated_NativeFunctionsCuda.cu.o] Error 1
CMakeFiles/Makefile2:226: recipe for target 'src/ATen/CMakeFiles/ATen.dir/all' failed
make[1]: *** [src/ATen/CMakeFiles/ATen.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2`

@luongminh97

This comment has been minimized.

Copy link

commented Jan 16, 2018

Anyone else getting this error at around the end of build_deps? /usr/local/cuda/lib64/libcudnn.so: error adding symbols: File in wrong format

@Hunterhal

This comment has been minimized.

Copy link

commented Jan 31, 2018

Hello I have tried to install on Jetson TX1 but it stops at build, and give segmentation error, I have tracked the error, it is due to building exceeds ram any suggestions ?

@derAtomkeks

This comment has been minimized.

Copy link

commented Feb 8, 2018

@Hunterhal Hmm for me the build is successful on TX2 but fails on TX1 as well. Seems to be an memory error, as gcc exits with error code 4. There is a tutorial on jetsonhacks.com telling how to set up a swap file. I hope that helps.. I'll try it myself next week and post updates here.

@dusty-nv

This comment has been minimized.

Copy link
Owner Author

commented Mar 18, 2018

@Hunterhal @derAtomkeks , it is due to TX1 having 4GB memory (vs TX2 8GB), SWAP is needed. Or you can build whl on TX2 and install it to TX1 running the same JetPack.

@thatwist

This comment has been minimized.

Copy link

commented Mar 20, 2018

When building on TX2 with cudnn 6 cuda 8, gcc5.4 got folllowing on the ATen building phase:

[ 47%] Building CXX object src/ATen/CMakeFiles/ATen.dir/__/TH/THVector.cpp.o
[ 48%] Building CXX object src/ATen/CMakeFiles/ATen.dir/__/THNN/init.cpp.o
[ 48%] Building CXX object src/ATen/CMakeFiles/ATen.dir/__/THS/THSTensor.cpp.o
[ 48%] Building CXX object src/ATen/CMakeFiles/ATen.dir/native/cpu/ReduceOpsKernel.cpp.AVX2.cpp.o
c++: error: unrecognized command line option ‘-mavx2’
src/ATen/CMakeFiles/ATen.dir/build.make:81805: recipe for target 'src/ATen/CMakeFiles/ATen.dir/native/cpu/ReduceOpsKernel.cpp.AVX2.cpp.o' failed
make[2]: *** [src/ATen/CMakeFiles/ATen.dir/native/cpu/ReduceOpsKernel.cpp.AVX2.cpp.o] Error 1

although gcc5.4 docs says it supports mavx2
UPD: I removed -mavx and -mavx2 options from build and it succeeded.

@Kowasaki

This comment has been minimized.

Copy link

commented Mar 20, 2018

@thatwist I am facing the exact same problem--can you show me how you removed it? Thanks!

EDIT: I found it--to those that are wondering, go to pytorch/aten/src/ATen/CMakeLists.txt, change the line "LIST(APPEND CPU_CAPABILITY_FLAGS "-O3" "-O3 -mavx" "-O3 -mavx2")" to "LIST(APPEND CPU_CAPABILITY_FLAGS "-O3" "-O3" "-O3")"

@thatwist

This comment has been minimized.

Copy link

commented Mar 22, 2018

@Kowasaki I just used find and sed to remove all -mavx2 and -mavx strings - something like
grep -rl "\-mavx2" * | xargs sed -i "s/-mavx2//g"
and then
grep -rl "\-mavx" * | xargs sed -i "s/-mavx//g"

@dusty-nv

This comment has been minimized.

Copy link
Owner Author

commented Mar 22, 2018

You guys may be interested in this script from jetson-reinforcement repo which remains updated:

https://github.com/dusty-nv/jetson-reinforcement/blob/master/CMakePreBuild.sh

It contains other stuff than just pyTorch but the pyTorch install works on TX2 with JetPack 3.2.

@felixendres

This comment has been minimized.

Copy link

commented Mar 23, 2018

As @Hunterhal and @derAtomkeks, I ran into memory issues on TX1 during sudo python setup.py develop with

aarch64-linux-gnu-gcc: internal compiler error: Killed (program cc1plus)

I worked around this by pausing the parallel compiler processes with
for pid in $(pidof cc1plus); do echo $pid; sudo kill -sigstop $pid; done
Then resume two of them with sudo kill -sigcont <printed-pid> directly and the other two later, when the rest of the compilations are done.

After compilation I got the message

WARNING: 'develop' is not building C++ code incrementally
because ninja is not installed. Run this to enable it:
pip install ninja

I tried that, but that failed with some other error. But maybe that would have allowed to just trigger the compilation repeatedly?

@idavis

This comment has been minimized.

Copy link

commented Aug 3, 2018

I've used python3 setup.py bdist_wheel and got the same cc1plus error. I solved this by allocating a 4GB swap file which allowed the build to complete.

@abhanjac

This comment has been minimized.

Copy link

commented Aug 8, 2018

I am trying to install pytorch from source on Odroid XU4 and having the following error. The installation is going up to 97% and then breaking.
Can anyone tell me how to fix this?

[ 97%] Linking CXX executable ../../bin/test_jit
[ 97%] Linking CXX executable ../../bin/test_api
../../lib/libtorch.so.1: undefined reference to dlclose' ../../lib/libtorch.so.1: undefined reference to dlsym'
../../lib/libtorch.so.1: undefined reference to dlopen' ../../lib/libtorch.so.1: undefined reference to dlerror'
collect2: error: ld returned 1 exit status
caffe2/torch/CMakeFiles/test_jit.dir/build.make:97: recipe for target 'bin/test_jit' failed
make[2]: *** [bin/test_jit] Error 1
CMakeFiles/Makefile2:2493: recipe for target 'caffe2/torch/CMakeFiles/test_jit.dir/all' failed
make[1]: *** [caffe2/torch/CMakeFiles/test_jit.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
../../lib/libtorch.so.1: undefined reference to dlclose' ../../lib/libtorch.so.1: undefined reference to dlsym'
../../lib/libtorch.so.1: undefined reference to dlopen' ../../lib/libtorch.so.1: undefined reference to dlerror'
collect2: error: ld returned 1 exit status
caffe2/torch/CMakeFiles/test_api.dir/build.make:513: recipe for target 'bin/test_api' failed
make[2]: *** [bin/test_api] Error 1
CMakeFiles/Makefile2:2533: recipe for target 'caffe2/torch/CMakeFiles/test_api.dir/all' failed
make[1]: *** [caffe2/torch/CMakeFiles/test_api.dir/all] Error 2
Makefile:138: recipe for target 'all' failed
make: *** [all] Error 2
Failed to run 'bash tools/build_pytorch_libs.sh --use-nnpack caffe2 nanopb libshm gloo THD'
odroid@odroid:~/pytorch$

@dan9thsense

This comment has been minimized.

Copy link

commented Sep 13, 2018

Worked like a charm on Jetson TX2 dev kit with Ubuntu 16.04.
Thanks for providing this script-- outstanding!

@syedmohsinbukhari

This comment has been minimized.

Copy link

commented Nov 2, 2018

Thanks !

@cshreyastech

This comment has been minimized.

Copy link

commented Nov 4, 2018

I am trying this is TX2 and got into below error. Has anyone seen this?
running build_ext
-- NumPy not found
-- Detected cuDNN at /usr/lib/aarch64-linux-gnu/libcudnn.so.7, /usr/include/
-- Not using MIOpen
-- Detected CUDA at /usr/local/cuda
-- Not using MKLDNN
-- Building NCCL library
-- Building with THD distributed package
-- Building with c10d distributed package
Traceback (most recent call last):
File "setup.py", line 1232, in
rel_site_packages + '/caffe2/**/*.py'
File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
dist.run_commands()
File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "setup.py", line 523, in run
setuptools.command.develop.develop.run(self)
File "/usr/lib/python2.7/dist-packages/setuptools/command/develop.py", line 34, in run
self.install_for_development()
File "/usr/lib/python2.7/dist-packages/setuptools/command/develop.py", line 119, in install_for_development
self.run_command('build_ext')
File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "setup.py", line 619, in run
generate_code(ninja_global)
File "/home/nvidia/pytorch/tools/setup_helpers/generate_code.py", line 84, in generate_code
from tools.autograd.gen_autograd import gen_autograd
File "/home/nvidia/pytorch/tools/autograd/gen_autograd.py", line 16, in
from .utils import YamlLoader, split_name_params
File "/home/nvidia/pytorch/tools/autograd/utils.py", line 14, in
from tools.shared.module_loader import import_module
File "/home/nvidia/pytorch/tools/shared/init.py", line 2, in
from .cwrap_common import set_declaration_defaults,
ImportError: No module named cwrap_common

@jreindel

This comment has been minimized.

Copy link

commented Apr 2, 2019

Hi there,
I am trying this on Jetson TX2 and everything completes, but then when I run the test commands I get this:

sudo python3

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import torch
print(torch.version)
1.1.0a0+929258a
print(torch.cuda.is_available())
True
a = torch.cuda.FloatTensor(2)
Traceback (most recent call last):
File "", line 1, in
RuntimeError: CUDA error: unknown error

I am running with the newest edition of L4T just released last month:
L4T 32.1
Ubuntu 18.04
Cuda 10.0 (V10.0.166)
Python 3.6.7

I am wondering if this may be a versioning issue. I did some searching around for that error but, as it lacks any real information (unknown error), the results were unhelpful. It did seem that some others (linux wide, not Tegra specifically) experienced this issue after upgrading from cuda 8 to cuda 9, and had to recompile PyTorch with cuda 9. I noticed others above noting being on cuda 9. So I wonder if the issue is I am on cuda 10, but then it was compiled using the version of cuda on my Jetson.

@MaazJamal

This comment has been minimized.

Copy link

commented May 2, 2019

Is it supposed to kill so many processes? I do not have swap on my Jetson TX2. should I add swap? Also the install is failing with
"Error: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-2edf12aa/" error. I have upgraded setuptools, installed ezinstall but it is still giving this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.