Skip to content

Instantly share code, notes, and snippets.

@abishekmuthian
Last active August 1, 2022 16:31
Show Gist options
  • Save abishekmuthian/04e1326bb9bed9cecb19c2d603c8d521 to your computer and use it in GitHub Desktop.
Save abishekmuthian/04e1326bb9bed9cecb19c2d603c8d521 to your computer and use it in GitHub Desktop.
Building Apache Arrow and pyarrow on ARMv8

Why build Apache Arrow from source on ARM?

Apache Arrow is an in-memory data structure used in several projects. It's python module can be used to save what's on the memory to the disk via python code, commonly used in the Machine Learning projects. With low RAM, ARM devices can make use of it but there seems to be an configuration error with the packaged binaries as of version 0.15.1 and so we're forced to build and install from the source.

The installation build steps are based on official guidelines but modified for ARM and has taken clues from building Ray for ARM.

My setup

I'm using Nvidia Jetson nano.

Quad-core ARM® Cortex®-A57 MPCore processor

NVIDIA Maxwell™ architecture with 128 NVIDIA CUDA® cores

4 GB 64-bit LPDDR4 1600MHz - 25.6 GB/s

Ubuntu 18.04 LTS

Python 3.6.9

Preparing the environment

I have created a separate directory for building arrow and have downloaded the sources in it. Download apache arrow sources from - https://github.com/apache/arrow/releases.

mkdir build
cd build
wget https://github.com/apache/arrow/archive/apache-arrow-0.15.1.zip
unzip arrow-apache-arrow-0.15.1.zip
cd arrow-apache-arrow-0.15.1/cpp
mkdir release
cd release
export ARROW_HOME=/usr/local/lib
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

Note: /usr/local/lib is the path where the arrow *.so files would finally be installed.

Installing Dependencies

Based on your setup, you could already have some of these packages installed in your setup; If so, skip installing those packages in this step.

sudo apt-get install libjemalloc-dev libboost-dev \
                       libboost-filesystem-dev \
                       libboost-system-dev \
                       libboost-regex-dev \
                       python3-dev \
                       autoconf \
                       flex \
                       bison \
                       libssl-dev \
                       curl \
                       cmake
pip3 install six numpy pandas cython pytest psutil                       

Build the cpp files & install the binary

I have built with all possible components to showcase the best case scenario, you wouldn't likely be needing several of these components; please perform the necessary due diligence of its functions.

-DARROW_CUDA=ON because I have CUDA capable ARM board. If you don't have an Nvidia ARM board, you don't need this.

-DPYTHON_EXECUTABLE=/usr/bin/python3 because my python3 resides in this path, replace with your python3 path if required.

make -j4 because my board has quad core CPU and building with 4 jobs parallely would improve the build time significantly. Depending upon the number of cores, threads available in your CPU, you could change this flag.

make install would install the compiled binary (*.so) in aformentioned directory.

cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \       
      -DCMAKE_INSTALL_LIBDIR=lib  \
      -DARROW_FLIGHT=ON \
      -DARROW_GANDIVA=ON  \
      -DARROW_ORC=ON  \
      -DARROW_WITH_BZ2=ON \       
      -DARROW_WITH_ZLIB=ON  \
      -DARROW_WITH_ZSTD=ON  \
      -DARROW_WITH_LZ4=ON \       
      -DARROW_WITH_SNAPPY=ON  \      
      -DARROW_WITH_BROTLI=ON  \       
      -DARROW_PARQUET=ON  \
      -DARROW_PYTHON=ON \
      -DARROW_PLASMA=ON \
      -DARROW_CUDA=ON \
      -DARROW_BUILD_TESTS=ON  \
      -DPYTHON_EXECUTABLE=/usr/bin/python3  \
      ..
make -j4
sudo make install

Build and install pyarrow

As with Arrow cpp, not all environmental flags are required for building and installing pyarrow. If you used a flag during the build of cpp files, you'll likely need it here as well.

cd ..
cd python/

pip3 install -r requirements.txt

export PYARROW_WITH_FLIGHT=1
export PYARROW_WITH_GANDIVA=1
export PYARROW_WITH_ORC=1
export PYARROW_WITH_PARQUET=1
export PYARROW_WITH_CUDA=1
export PYARROW_WITH_PLASMA=1

python3 setup.py build_ext --inplace

sudo -E python3 setup.py install

Note: If you are building and installing on your ARM box at intervals, you may loose the environmental flags. Ensure required environmental flags are set before building and installation. If you're using sudo to install, use sudo -E to export the environment flags to sudo.

Add LD_LIBRARY to path

LD_LIBRARY path is needed for arrow, pyarrow to function properly. Add the path to the ~.bashrc.

In my case,

export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

Test Pyarrow

I tested pyarrow by importing it in the python command line.

python3 -v
from pyarrow import compat

If the above import statement didn't result in any error, then it's all good. If it resulted in any error, ensure LD_LIBRARY path is set right as explained in a previous section.

@z14git
Copy link

z14git commented Nov 9, 2020

Having same issue building for the TX2. I'm trying to build version 0.17.1 as its a required dependency for tensorflow 2.3
When doing the arch hack it seems to work but then its not able to find the Arrow Libs even though I set it explicitly for the python cmake
PYARROW_CMAKE_OPTIONS="-DARROW_LIB_DIR=/usr/local/lib/libarrow/" python3 setup.py build_ext --inplace

This is the error that I get. I even tried moving those FindCmakes to /usr/share/cmake-3.10/Modules/

CMake Error at /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
  Could NOT find Arrow (missing: ARROW_LIB_DIR) (found version "0.17.1")
Call Stack (most recent call first):
  /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  cmake_modules/FindArrow.cmake:412 (find_package_handle_standard_args)
  cmake_modules/FindArrowPython.cmake:46 (find_package)
  CMakeLists.txt:210 (find_package)

I am out of ideas any input is welcome :)

try export ARROW_HOME=/usr/local not export ARROW_HOME=/usr/local/lib before cmake

@pandeyp
Copy link

pandeyp commented Feb 3, 2021

Can I build a wheel that bundles the .so files into the wheel itself and delivers it to site-packages as it is done in normal x86 instances without having to manually export LD_LIBARY_PATH? Manual export requires restart of the process that is already running and hence is problematic for my use case.

@alexander-pv
Copy link

alexander-pv commented Feb 12, 2021

OK - I have a build running. I'll post the script when it's finished. Meanwhile, the trick is that you only install the apt packages it needs to complete the cmake step successfully. After that, the make will download the source and compile anything you didn't already have, for example parquet.

I'm having the exact issue as jakethequid. cmake and make compile, but with 'python3 setup.py build_ext --inplace' get "No package 'parquet' found" and. Did you post the build? Thank you

Hello. Perhaps this answer is very outdated. But I still decided to write here to help others, since I recently set up the build for the jetson device. There are many options that are written in /arrow/python/setup.py, so, for example, to build and to install pyarrow with parquet, you can write:

$ sudo -E python3 setup.py build_ext --with-parquet install

with CUDA support:
$ sudo -E python3 setup.py build_ext --with-cuda install

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment