Skip to content

Instantly share code, notes, and snippets.

@jdye64
Last active April 20, 2020 16:01
Show Gist options
  • Save jdye64/fe4642e278aa4b1dc40544371815cf0b to your computer and use it in GitHub Desktop.
Save jdye64/fe4642e278aa4b1dc40544371815cf0b to your computer and use it in GitHub Desktop.
1) clone the cudf repo, cloned from my repo, made cudf_xavier branch and added upstream to rapidsai/cudf expecting there might be some code changes I need to make to cudf and can capture those changes in this branch
2) Installed cmake via sudo apt-get install cmake since build.sh wouldn't work without cmake installed
3) That caused problems because the cmake version is installed was 3.10.2 and cudf needs >= 3.12 .... lets try something else. SKIP THIS STEP!
4) I wanted to do this without conda but I'm going to install conda and use the cmake that it installs.
5) Of course Anaconda does not seem to officially support ARM64 so that route is not going to work ... something else
6) I could build the latest version of cmake from source ... lets try that. Cmake does not offer binaries for ARM64 directly without building them.
7) cd /tmp && wget https://github.com/Kitware/CMake/releases/download/v3.16.2/cmake-3.16.2.tar.gz && tar -xzvf ./cmake-3.16.2.tar.gz && cd cmake-3.16.2 && ./bootstrap && make && make install
7A) Already starting to see why cross-compiling is basically going to be required to make this work smoothly ...
8) I'm starting to see this is going to be a very very complicated setup. I'm going to use these notes to make a AWS/GCP VM for others to use and also an Ansible script that allows for people to setup for cross-compiling on their Nvidia workstations. I would like to run with this and make something really good for others to use. Also we could use that as part of our CI environment.
9) The cmake compile takes a good bit of time, this is unfortunate. Maybe I should save the binary in S3 or something so that when others want to use they can pull that binary directly. Will investigate
10) The cmake build failed ... trying to install libssl-dev and see if that moves it along, ok that did the trick surprisingly! Can move on to running make now ... will report back.
11) Ok make worked. Lets install
12) make install did not work ... just missing SUDO make install
13) Ok tried running ./build.sh again ... boost was missing so installed sudo apt-get install -y libboost-all-dev .. waiting ... then trying ./build.sh again
14) Every single time it gets me ... you have to remember to checkout the git submodules ... git submodule update --init --remote --recursive
15) Ok getting several cmake errors, likely because things like RMM and such do not exist so here we go down the rabbit hole to build those dependencies first.
16) add export CUDA_HOME=/usr/local/cuda-10.0 to ~/.profile
17) add export CUDACXX=/usr/local/cuda-10.0/bin/nvcc to ~/.profile
18) cd ~/Development && git clone https://github.com/rapidsai/rmm && cd rmm && git submodule update --init --remote --recursive && ./build.sh && sudo --preserve-env=CUDA_HOME ./build.sh
19) I was seeing some confusing syntax errors and realized that python 2.7 is installed by default so needed to install python 3.7
20) sudo apt-get install python3.7
21) add alias python=python3 to ~/.bashrc to make python command run python3
22) Adding the alias didn't work for sudo so updated the alternatives
23) sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 10
24) ok that worked but now I'm missing Cython so I guess its to the point where we start missing python modules that are required
25) Thinking of the best way to install python modules ....
26) I really didn't want to but it just seems painful to not so I'm going to use pip
27) sudo apt-get install python3-pip
28) in ~/.bashrc alias pip=pip3
29) pip install cython -- takes a good bit of time, was a little concerned it took so long
30) Build worked for RMM now, needed to install numpy which is a runtime dependency however.
31) pip install numpy
32) pip install numba
33) Errors while trying to pip install numba. Seems to need llvm? Trying sudo apt-get install llvm-7
34) lots of good info here https://github.com/jefflgaol/Install-Packages-Jetson-ARM-Family
35) $ wget http://releases.llvm.org/7.0.1/llvm-7.0.1.src.tar.xz
$ tar -xvf llvm-7.0.1.src.tar.xz
$ cd llvm-7.0.1.src
$ mkdir llvm_build_dir
$ cd llvm_build_dir/
$ cmake ../ -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="ARM;X86;AArch64"
$ make -j4
$ sudo make install
$ cd bin/
$ echo "export LLVM_CONFIG=\""`pwd`"/llvm-config\"" >> ~/.bashrc
$ echo "alias llvm='"`pwd`"/llvm-lit'" >> ~/.bashrc
$ source ~/.bashrc
$ pip install llvmlite
32) pip install numba should work now
32) python -c "import rmm" ... works now!
33) Tried to build cudf again but cmake is complaining about dlpack not being available
34) dlpack is not in apt
35) attempting to install via pip and see what happens
36) Install via pip did not work, apparently its just the headers that are needed
37) git clone https://github.com/dmlc/dlpack.git
38) You can use DLPACK_ROOT ENV which the cmake file will use to locate the dlpack headers. Lets set that variable and reboot
39) set /etc/profile export DLPACK_ROOT=/home/jdyer/Development/dlpack
40) got errors when running ./build.sh .... needs to be ran with sudo so sudo ./build.sh
41) Errors about cmake and setup tools?
43) pip install wheel
42) pip install cmake_setuptools
43) Trying to run sudo -E ./build.sh caused the xavier to turn itself off apparently from heat after awhile. The scripts attempts to use all cores don't think that will fly. going to try and build with a single thread now
44) cd ~/Development/cudf/cpp/build && sudo make
45) Build failed because DLPACK wasn't found in the tests builds ... for now I turned off the tests building by default.
46) ran sudo -E ./build.sh after make was finished because things like arrow and other libraries were not built with that command
47) Cython build failed due to missing pyarrow. Needed to install pyarrow pip install pyarrow
48) I imagine the arrow library that is built as part of libcudf will work fine here? Lets find out.
49) pip install fastavro
50) pip install fsspec
51) pip install pandas
52) pip install cupy
53) Pyarrow is still not installing, likely because it can not find the arrow library. Investigating if there are environment variables or something I can set. Its failing in cmake find_package as part of a pip install
54) Need to set ARROW_HOME in order for the pip install pyarrow to know where to locate arrow libraries and headers at.
55) vi /etc/profile export ARROW_HOME=/home/jdyer/Development/cudf/cpp/build/arrow/install
56) That is still missing the python bindings. Changing https://github.com/rapidsai/cudf/blob/branch-0.12/cpp/cmake/Modules/ConfigureArrow.cmake#L21 to make sure that the python libraries are built.
57) That worked but now missing fatal error: arrow/compute/api.h: No such file or directory
58) Need to use version 0.15.1 of Apache arrow due to version 0.15.0 which is currently being used not having some headers files that are needed by the pip pyarrow installation. https://github.com/rapidsai/cudf/blob/branch-0.12/cpp/cmake/Templates/Arrow.CMakeLists.txt.cmake#L7
59) After changing that another build occurred and more things needed to change
60) pip install 'pyarrow==0.15.1'
61) Things are still failing ... seems like I actually didn't need to change the version of arrow from 0.15.0 -> 0.15.1 so don't do those steps when doing this again.
62) Trying to figure out how to get those "compute" directories included. Likely a CMake flag.
63) Yep need to add -DARROW_COMPUTE=ON. This is set to off in the cmake/Modules/ConfigureArrow.cmake file. Changed that to ON from OFF
64) now libarrow.so isn't being found .... looks like just libarrow.a is present since cudf builds it as a static library. Let me see if there is a way to build the shared object.
65) Yep set BUILD_SHARED_LIBRARY to ON FROM OFF
66) Need to add the ARROW_HOME/include value to cudf/setup.py so that arrow can be found when compiled cythonized cpp
67) Need to add /include to the cudf/setup.py so that RMM can be located there. In fact looks like arrow is there as well so no need for the above steps but going to leave them for now.
68) Need to add DLPACK headers to cudf/setup.py /home/jdyer/Development/dlpack/include
69)
TODO:
1) Add flag to build.sh to disable building tests
2) Change the DLPACK "thirdparty" bit that is left in the cmake test file
3) Make virtualenv for python install
4) Create Ansible playbook for installing on Xavier device
5) Pip freeze environment
6) Package pre-built binaries to share with others
1) cd ~/Development && clone the cuml repo https://github.com/rapidsai/cuml.git
2) In this setup I have already built cuDF from source, this means thatt many of the expected dependencies are already present so no need to build those again here. The script should check for their prescence first however.
3) cd ~/Development/cuml/cpp && mkdir build && cd build
4) cmake ..
5) Ok protobuf is missing, this makes sense because cuDF does not have protobuf as a dependency ... lets build protobuf from scratch mostly so we can control which version we need and not be dependent on the version provided by the OS package manager ... remember conda is not a viable option here or otherwise that would be the best option.
6) cd ~/Development && git clone https://github.com/google/protobuf.git && cd protobuf && git submodule update --init --recursive && ./autogen.sh && ./configure && make -j4 && sudo make install && sudo ldconfig
7) After the above completes we should have protobuf install and we can try again
8) cd ~/Development/cuml/cpp/build && cmake ..
9) Protobuf is now found, good, however NCCL library is now missing so lets build that
10) cd ~/Development && git clone https://github.com/NVIDIA/nccl.git && cd nccl && make -j4 src.build && sudo make install
11) cd ~/Development/cuml/cpp/build && cmake ..
12) cd ~/Development && git clone https://github.com/xianyi/OpenBLAS.git && cd OpenBLAS && make && sudo make install
13) cd ~/Development/cuml/cpp && rm -rf ./build && mkdir build && cd build && cmake -DBLAS_LIBRARIES=/opt/OpenBLAS/lib/libopenblas.so ..
14) LAPACK is now missing so we need to install that
15) cd ~/Development && git clone https://github.com/Reference-LAPACK/lapack.git && cd ~/Development/lapack && mkdir build && cd build && cmake .. && cmake --buildd . -j4 --target INSTALL && make && sudo make install
16) Fortran compiler is missing so need to install that now ....
17) sudo apt-get install gfortran
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment