Skip to content

Instantly share code, notes, and snippets.

@abishekmuthian
Last active August 1, 2022 16:31
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abishekmuthian/04e1326bb9bed9cecb19c2d603c8d521 to your computer and use it in GitHub Desktop.
Save abishekmuthian/04e1326bb9bed9cecb19c2d603c8d521 to your computer and use it in GitHub Desktop.
Building Apache Arrow and pyarrow on ARMv8

Why build Apache Arrow from source on ARM?

Apache Arrow is an in-memory data structure used in several projects. It's python module can be used to save what's on the memory to the disk via python code, commonly used in the Machine Learning projects. With low RAM, ARM devices can make use of it but there seems to be an configuration error with the packaged binaries as of version 0.15.1 and so we're forced to build and install from the source.

The installation build steps are based on official guidelines but modified for ARM and has taken clues from building Ray for ARM.

My setup

I'm using Nvidia Jetson nano.

Quad-core ARM® Cortex®-A57 MPCore processor

NVIDIA Maxwell™ architecture with 128 NVIDIA CUDA® cores

4 GB 64-bit LPDDR4 1600MHz - 25.6 GB/s

Ubuntu 18.04 LTS

Python 3.6.9

Preparing the environment

I have created a separate directory for building arrow and have downloaded the sources in it. Download apache arrow sources from - https://github.com/apache/arrow/releases.

mkdir build
cd build
wget https://github.com/apache/arrow/archive/apache-arrow-0.15.1.zip
unzip arrow-apache-arrow-0.15.1.zip
cd arrow-apache-arrow-0.15.1/cpp
mkdir release
cd release
export ARROW_HOME=/usr/local/lib
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

Note: /usr/local/lib is the path where the arrow *.so files would finally be installed.

Installing Dependencies

Based on your setup, you could already have some of these packages installed in your setup; If so, skip installing those packages in this step.

sudo apt-get install libjemalloc-dev libboost-dev \
                       libboost-filesystem-dev \
                       libboost-system-dev \
                       libboost-regex-dev \
                       python3-dev \
                       autoconf \
                       flex \
                       bison \
                       libssl-dev \
                       curl \
                       cmake
pip3 install six numpy pandas cython pytest psutil                       

Build the cpp files & install the binary

I have built with all possible components to showcase the best case scenario, you wouldn't likely be needing several of these components; please perform the necessary due diligence of its functions.

-DARROW_CUDA=ON because I have CUDA capable ARM board. If you don't have an Nvidia ARM board, you don't need this.

-DPYTHON_EXECUTABLE=/usr/bin/python3 because my python3 resides in this path, replace with your python3 path if required.

make -j4 because my board has quad core CPU and building with 4 jobs parallely would improve the build time significantly. Depending upon the number of cores, threads available in your CPU, you could change this flag.

make install would install the compiled binary (*.so) in aformentioned directory.

cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \       
      -DCMAKE_INSTALL_LIBDIR=lib  \
      -DARROW_FLIGHT=ON \
      -DARROW_GANDIVA=ON  \
      -DARROW_ORC=ON  \
      -DARROW_WITH_BZ2=ON \       
      -DARROW_WITH_ZLIB=ON  \
      -DARROW_WITH_ZSTD=ON  \
      -DARROW_WITH_LZ4=ON \       
      -DARROW_WITH_SNAPPY=ON  \      
      -DARROW_WITH_BROTLI=ON  \       
      -DARROW_PARQUET=ON  \
      -DARROW_PYTHON=ON \
      -DARROW_PLASMA=ON \
      -DARROW_CUDA=ON \
      -DARROW_BUILD_TESTS=ON  \
      -DPYTHON_EXECUTABLE=/usr/bin/python3  \
      ..
make -j4
sudo make install

Build and install pyarrow

As with Arrow cpp, not all environmental flags are required for building and installing pyarrow. If you used a flag during the build of cpp files, you'll likely need it here as well.

cd ..
cd python/

pip3 install -r requirements.txt

export PYARROW_WITH_FLIGHT=1
export PYARROW_WITH_GANDIVA=1
export PYARROW_WITH_ORC=1
export PYARROW_WITH_PARQUET=1
export PYARROW_WITH_CUDA=1
export PYARROW_WITH_PLASMA=1

python3 setup.py build_ext --inplace

sudo -E python3 setup.py install

Note: If you are building and installing on your ARM box at intervals, you may loose the environmental flags. Ensure required environmental flags are set before building and installation. If you're using sudo to install, use sudo -E to export the environment flags to sudo.

Add LD_LIBRARY to path

LD_LIBRARY path is needed for arrow, pyarrow to function properly. Add the path to the ~.bashrc.

In my case,

export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

Test Pyarrow

I tested pyarrow by importing it in the python command line.

python3 -v
from pyarrow import compat

If the above import statement didn't result in any error, then it's all good. If it resulted in any error, ensure LD_LIBRARY path is set right as explained in a previous section.

@cyb70289
Copy link

Very useful. Thanks.

@MyMakibox
Copy link

Thanks for this writeup!! I wanted pyarrow to test out kedro.

There were a few challenges:

Building on a Jetson Nano, before cmake, I needed

sudo apt-get install llvm-7 clang

Then, for build and install pyarrow, i needed

cd ../..

to exit cpp/release directory and before cd python.

Then, the library files were installed to

usr/local/lib/lib

while LD_LIBRARY PATH pointed to /usr/local/lib.

So I modified with

sudo mv usr/local/lib/lib usr/local/lib/libarrow
export LD_LIBRARY_PATH=/usr/local/lib/libarrow:$LD_LIBRARY_PATH

@abishekmuthian
Copy link
Author

@MyMakibox

Building on a Jetson Nano, before cmake, I needed

sudo apt-get install llvm-7 clang

Interesting, although I did install clang from source during my initial troubleshooting for this install it didn't matter for successful compilation in the final attempt detailed above as it compiled via GCC.

Thanks for sharing.

@victoryang00
Copy link

great

@jakethesquid
Copy link

Hi heavyinfo, thanks for this write up it has been very useful since i'm attempting to get this (and CuDF) working on a Jetson TX1.

I am having some issue with running "python3 setup.py build_ext --inplace" from the python folder, where i get the error:

-- Checking for module 'parquet'
-- No package 'parquet' found
-- Could not find the parquet library. Looked in system search paths.
CMake Error at CMakeLists.txt:419 (message):
Unable to locate Parquet libraries

I have verified that libparquet.so exists in /usr/local/lib/lib/ and even tried creating a sym link in the python folder.

Any ideas on how to get past this error?

@znmeb
Copy link

znmeb commented Jul 11, 2020

Hi heavyinfo, thanks for this write up it has been very useful since i'm attempting to get this (and CuDF) working on a Jetson TX1.

I am having some issue with running "python3 setup.py build_ext --inplace" from the python folder, where i get the error:

-- Checking for module 'parquet'
-- No package 'parquet' found
-- Could not find the parquet library. Looked in system search paths.
CMake Error at CMakeLists.txt:419 (message):
Unable to locate Parquet libraries

I have verified that libparquet.so exists in /usr/local/lib/lib/ and even tried creating a sym link in the python folder.

Any ideas on how to get past this error?

You may need the C header files for libparquet - is there an APT package called libparquet-dev and if so, is it installed?

@jakethesquid
Copy link

I've had a look (using apt list --installed) and there are no libparquet packages installed, so i tried running sudo apt install libparquet-dev and got the error message: E: Unable to locate package libparquet-dev

Should it be as simple as running the apt get command to install that package? Or is there something else i'm missing?

@znmeb
Copy link

znmeb commented Jul 11, 2020

I've had a look (using apt list --installed) and there are no libparquet packages installed, so i tried running sudo apt install libparquet-dev and got the error message: E: Unable to locate package libparquet-dev

Should it be as simple as running the apt get command to install that package? Or is there something else i'm missing

I've had a look (using apt list --installed) and there are no libparquet packages installed, so i tried running sudo apt install libparquet-dev and got the error message: E: Unable to locate package libparquet-dev

Should it be as simple as running the apt get command to install that package? Or is there something else i'm missing?

It's probably not in the repositories then, which means you'll need to build Parquet from source. Let me check it on my Nano.

@znmeb
Copy link

znmeb commented Jul 11, 2020

OK - I have a build running. I'll post the script when it's finished. Meanwhile, the trick is that you only install the apt packages it needs to complete the cmake step successfully. After that, the make will download the source and compile anything you didn't already have, for example parquet.

@znmeb
Copy link

znmeb commented Jul 12, 2020

#! /bin/bash

# Linux dependencies
sudo apt-get install \
  autoconf \
  bison \
  clang-7 \
  cmake \
  curl \
  flex \
  libboost-dev \
  libboost-filesystem-dev \
  libboost-regex-dev \
  libboost-system-dev \
  libjemalloc-dev \
  libssl-dev \
  llvm-7-dev \
  python3-dev

rm -fr apache-arrow-*.zip
wget -q https://github.com/apache/arrow/archive/apache-arrow-0.15.1.zip
rm -fr arrow-apache-arrow-*
unzip -qq apache-arrow-0.15.1.zip
cd arrow-apache-arrow-0.15.1/cpp
mkdir release
cd release
export ARROW_HOME=/usr/local/lib
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
      -DCMAKE_INSTALL_LIBDIR=lib  \
      -DARROW_FLIGHT=ON \
      -DARROW_GANDIVA=ON  \
      -DARROW_ORC=ON  \
      -DARROW_WITH_BZ2=ON \
      -DARROW_WITH_ZLIB=ON  \
      -DARROW_WITH_ZSTD=ON  \
      -DARROW_WITH_LZ4=ON \
      -DARROW_WITH_SNAPPY=ON  \
      -DARROW_WITH_BROTLI=ON  \
      -DARROW_PARQUET=ON  \
      -DARROW_PYTHON=ON \
      -DARROW_PLASMA=ON \
      -DARROW_CUDA=ON \
      -DARROW_BUILD_TESTS=ON  \
      -DPYTHON_EXECUTABLE=/usr/bin/python3  \
      ..

make --jobs=`nproc`
sudo make install

@jakethesquid
Copy link

Thanks for putting that script together, however, after running the script i still get the same error when running the python3 setup.py build_ext --inplace line after sucessfully running that script. Really not sure whats going on here, seems like parquet builds fine...

@TristanShoemaker
Copy link

Hello,

I've been following this very useful guide trying to get pyarrow running on a raspi 4.

Unfortunately, I've run into an error when running python3 setup.py build_ext --inplace

CMake Error at cmake_modules/SetupCxxFlags.cmake:368 (message): Unsupported arch flag: -march=.

I found a JIRA post about possible hardcoding in -march=armv8-a at the offending line, but this results in the same error.

Anybody have ideas? I'm not very familiar with cmake/ARM flags.
Thanks!

@abishekmuthian
Copy link
Author

abishekmuthian commented Jul 27, 2020

Thanks for putting that script together, however, after running the script i still get the same error when running the python3 setup.py build_ext --inplace line after sucessfully running that script. Really not sure whats going on here, seems like parquet builds fine...

Can you try sudo ldconfig and also export appropriate LD_CONFIG path for the installed dependencies?

Note if you are using sudo to build, the environment variables might not get passed especially the LD_CONFIG and even sudo -E would work only for env variables and not for LD_CONFIG ; In that case you need to pass LD_CONFIG path after sudo along with build command.

P.S. Thanks for others for helping each other in this thread, I appreciate it.

@abishekmuthian
Copy link
Author

Unsupported arch flag: -march=.

Are you on the docker too? as mentioned in the JIRA issue you've mentioned - https://issues.apache.org/jira/browse/ARROW-8992.

@TristanShoemaker
Copy link

Are you on the docker too?

No I'm not using docker, I've followed your build commands (minus some of the flags that I didn't need).

@austinjp
Copy link

austinjp commented Jul 29, 2020

Just another data point, no solutions for @TristanShoemaker unfortunately. I'm trying to build on an Nvidia Jetson Nano, and it fails at the penultimate stage, python3 setup.py build_ext --inplace with the following. Note, I've disabled Gandiva since I ran into specific issues with that.

$ python3 setup.py build_ext 
/whatever/venv/lib/python3.6/site-packages/setuptools/distutils_patch.py:26: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
  "Distutils was imported before Setuptools. This usage is discouraged "
running build_ext
-- Running cmake for pyarrow
cmake -DPYTHON_EXECUTABLE=/whatever/venv/bin/python3  -DPYARROW_BUILD_CUDA=on -DPYARROW_BUILD_FLIGHT=on -DPYARROW_BUILD_GANDIVA=off -DPYARROW_BUILD_DATASET=off -DPYARROW_BUILD_ORC=on -DPYARROW_BUILD_PARQUET=on -DPYARROW_BUILD_PLASMA=on -DPYARROW_BUILD_S3=off -DPYARROW_BUILD_HDFS=off -DPYARROW_USE_TENSORFLOW=off -DPYARROW_BUNDLE_ARROW_CPP=off -DPYARROW_BUNDLE_BOOST=off -DPYARROW_GENERATE_COVERAGE=off -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_PARQUET_USE_SHARED=on -DCMAKE_BUILD_TYPE=release /tmp/arrow-apache-arrow-0.17.1/python
-- System processor: aarch64
-- Arrow build warning level: PRODUCTION
CMake Error at cmake_modules/SetupCxxFlags.cmake:338 (message):
  Unsupported arch flag: -march=.
Call Stack (most recent call first):
  CMakeLists.txt:100 (include)


-- Configuring incomplete, errors occurred!
See also "/tmp/arrow-apache-arrow-0.17.1/python/build/temp.linux-aarch64-3.6/CMakeFiles/CMakeOutput.log".
See also "/tmp/arrow-apache-arrow-0.17.1/python/build/temp.linux-aarch64-3.6/CMakeFiles/CMakeError.log".
error: command 'cmake' failed with exit status 1

The build succeeds for me after editing cmake_modules/SetupCxxFlags.cmake as follows:

Line 57:

  set(ARROW_ARMV8_ARCH_FLAG "-march=armv8-a")

Line 338-340 just commented out:

  # if(NOT CXX_SUPPORTS_ARMV8_ARCH)                                                                                                                                                                                              
  #   message(FATAL_ERROR "Unsupported arch flag: ${ARROW_ARMV8_ARCH_FLAG}.")                                                                                                                                                    
  # endif()

After those edits, running python3 setup.py build_ext --inplace succeeds, although I haven't actually used Arrow yet so I don't know if further issues await :)

Thanks to @heavyinfo for putting this together.

@TristanShoemaker
Copy link

Line 338-340 just commented out:

Scary! I didn't try commenting out the error line, I'll give that a try as well. I'm still completely confused as to why gcc is refusing the flag, it's listed as a valid architecture in the documentation, and it's also the most general flag for the ARM cortex A-72.

@austinjp
Copy link

Line 338-340 just commented out:

Scary! I didn't try commenting out the error line, I'll give that a try as well. I'm still completely confused as to why gcc is refusing the flag, it's listed as a valid architecture in the documentation, and it's also the most general flag for the ARM cortex A-72.

Indeed. Unfortunately I've run into multiple other errors with this, so I'm trying another approach. My end-goal is actually to install Huggingface's nlp but I'm encountering all sorts of inter-dependency issues. In case anyone cares, I'm currently trying with conda. Mid-way through so I can't yet report success or failure.

@abishekmuthian
Copy link
Author

Interesting results @austinjp, I hope you guys are working with release source and not bleeding edge git clone.

I'm currently trying with conda

Conda has always meant trouble in ARM for me, so I don't use it in-spite of all the data science/ML projects making it the de-facto install procedure. But, to be fair Conda doesn't have stable release for aarch64.

@zhuzhikun15973
Copy link

Hello,

I've been following this very useful guide trying to get pyarrow running on a raspi 4.

Unfortunately, I've run into an error when running python3 setup.py build_ext --inplace

CMake Error at cmake_modules/SetupCxxFlags.cmake:368 (message): Unsupported arch flag: -march=.

I found a JIRA post about possible hardcoding in -march=armv8-a at the offending line, but this results in the same error.

Anybody have ideas? I'm not very familiar with cmake/ARM flags.
Thanks!

Hi, try to run python3 setup.py clean after you modify cmake_modules/SetupCxxFlags.cmake, then try python3 setup.py build_ext --inplace again.

@dietrich
Copy link

dietrich commented Oct 15, 2020

OK - I have a build running. I'll post the script when it's finished. Meanwhile, the trick is that you only install the apt packages it needs to complete the cmake step successfully. After that, the make will download the source and compile anything you didn't already have, for example parquet.

I'm having the exact issue as jakethequid. cmake and make compile, but with 'python3 setup.py build_ext --inplace' get "No package 'parquet' found" and. Did you post the build? Thank you

@lorenzave
Copy link

Having same issue building for the TX2. I'm trying to build version 0.17.1 as its a required dependency for tensorflow 2.3
When doing the arch hack it seems to work but then its not able to find the Arrow Libs even though I set it explicitly for the python cmake
PYARROW_CMAKE_OPTIONS="-DARROW_LIB_DIR=/usr/local/lib/libarrow/" python3 setup.py build_ext --inplace

This is the error that I get. I even tried moving those FindCmakes to /usr/share/cmake-3.10/Modules/

CMake Error at /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
  Could NOT find Arrow (missing: ARROW_LIB_DIR) (found version "0.17.1")
Call Stack (most recent call first):
  /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  cmake_modules/FindArrow.cmake:412 (find_package_handle_standard_args)
  cmake_modules/FindArrowPython.cmake:46 (find_package)
  CMakeLists.txt:210 (find_package)

I am out of ideas any input is welcome :)

@znmeb
Copy link

znmeb commented Oct 29, 2020

I'm still hacking away at this - I've had partial success but the best I've been able to do is get either Arrow C++ or PyArrow to work - if I do both there's some kind of namespace conflict and PyArrow stops working.

I know some NVIDIA engineers have gotten their RAPIDS framework, which includes Arrow, to work on a Jetson AGX Xavier. RAPIDS won't work on the Nano - it needs a newer GPU. But Arrow should.

I'm going to

a. ask this in the NVIDIA Developer Forum
b. keep hacking on a new strategy - local builds using the conda-forge tools

@z14git
Copy link

z14git commented Nov 9, 2020

Having same issue building for the TX2. I'm trying to build version 0.17.1 as its a required dependency for tensorflow 2.3
When doing the arch hack it seems to work but then its not able to find the Arrow Libs even though I set it explicitly for the python cmake
PYARROW_CMAKE_OPTIONS="-DARROW_LIB_DIR=/usr/local/lib/libarrow/" python3 setup.py build_ext --inplace

This is the error that I get. I even tried moving those FindCmakes to /usr/share/cmake-3.10/Modules/

CMake Error at /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
  Could NOT find Arrow (missing: ARROW_LIB_DIR) (found version "0.17.1")
Call Stack (most recent call first):
  /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  cmake_modules/FindArrow.cmake:412 (find_package_handle_standard_args)
  cmake_modules/FindArrowPython.cmake:46 (find_package)
  CMakeLists.txt:210 (find_package)

I am out of ideas any input is welcome :)

try export ARROW_HOME=/usr/local not export ARROW_HOME=/usr/local/lib before cmake

@pandeyp
Copy link

pandeyp commented Feb 3, 2021

Can I build a wheel that bundles the .so files into the wheel itself and delivers it to site-packages as it is done in normal x86 instances without having to manually export LD_LIBARY_PATH? Manual export requires restart of the process that is already running and hence is problematic for my use case.

@alexander-pv
Copy link

alexander-pv commented Feb 12, 2021

OK - I have a build running. I'll post the script when it's finished. Meanwhile, the trick is that you only install the apt packages it needs to complete the cmake step successfully. After that, the make will download the source and compile anything you didn't already have, for example parquet.

I'm having the exact issue as jakethequid. cmake and make compile, but with 'python3 setup.py build_ext --inplace' get "No package 'parquet' found" and. Did you post the build? Thank you

Hello. Perhaps this answer is very outdated. But I still decided to write here to help others, since I recently set up the build for the jetson device. There are many options that are written in /arrow/python/setup.py, so, for example, to build and to install pyarrow with parquet, you can write:

$ sudo -E python3 setup.py build_ext --with-parquet install

with CUDA support:
$ sudo -E python3 setup.py build_ext --with-cuda install

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment