Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Install pyarrow on alpine in docker
FROM python:3.7-alpine3.8
RUN apk add --no-cache \
build-base \
cmake \
bash \
jemalloc-dev \
boost-dev \
autoconf \
zlib-dev \
flex \
bison
RUN pip install --no-cache-dir six pytest numpy cython
RUN pip install --no-cache-dir pandas
ARG ARROW_VERSION=0.12.0
ARG ARROW_SHA1=2ede75769e12df972f0acdfddd53ab15d11e0ac2
ARG ARROW_BUILD_TYPE=release
ENV ARROW_HOME=/usr/local \
PARQUET_HOME=/usr/local
#Download and build apache-arrow
RUN mkdir /arrow \
&& apk add --no-cache curl \
&& curl -o /tmp/apache-arrow.tar.gz -SL https://github.com/apache/arrow/archive/apache-arrow-${ARROW_VERSION}.tar.gz \
&& echo "$ARROW_SHA1 *apache-arrow.tar.gz" | sha1sum /tmp/apache-arrow.tar.gz \
&& tar -xvf /tmp/apache-arrow.tar.gz -C /arrow --strip-components 1 \
&& mkdir -p /arrow/cpp/build \
&& cd /arrow/cpp/build \
&& cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DARROW_PARQUET=on \
-DARROW_PYTHON=on \
-DARROW_PLASMA=on \
-DARROW_BUILD_TESTS=OFF \
.. \
&& make -j$(nproc) \
&& make install \
&& cd /arrow/python \
&& python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet \
&& python setup.py install \
&& rm -rf /arrow /tmp/apache-arrow.tar.gz
@jensenity

This comment has been minimized.

Copy link

@jensenity jensenity commented Nov 25, 2019

Hi,

First of all, thank you for making this script to easier get pyarrow with alpine on my Dockerfile.

But I can see that the version for pyarrow you are using is 0.12.0 and I'm planning to use a newer version to use their json library functions.

And when I tried version 0.15.1, this kind of error log came up.

[ 24%] Completed 'rapidjson_ep'
[ 24%] Built target rapidjson_ep
Scanning dependencies of target flatbuffers_ep
[ 24%] Creating directories for 'flatbuffers_ep'
[ 25%] Performing download step (download, verify and extract) for 'flatbuffers_ep'
-- flatbuffers_ep download command succeeded.  See also /arrow/cpp/build/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-download-*.log
[ 25%] No patch step for 'flatbuffers_ep'
[ 25%] No update step for 'flatbuffers_ep'
[ 26%] Performing configure step for 'flatbuffers_ep'
-- flatbuffers_ep configure command succeeded.  See also /arrow/cpp/build/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-configure-*.log
[ 26%] Performing build step for 'flatbuffers_ep'
CMake Error at /arrow/cpp/build/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-RELEASE.cmake:16 (message):
  Command failed: 2

   'make'

  See also

    /arrow/cpp/build/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-*.log


make[2]: *** [CMakeFiles/flatbuffers_ep.dir/build.make:112: flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build] Error 1
make[1]: *** [CMakeFiles/Makefile2:730: CMakeFiles/flatbuffers_ep.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
-- brotli_ep build command succeeded.  See also /arrow/cpp/build/brotli_ep-prefix/src/brotli_ep-stamp/brotli_ep-build-*.log
[ 26%] Performing install step for 'brotli_ep'
-- brotli_ep install command succeeded.  See also /arrow/cpp/build/brotli_ep-prefix/src/brotli_ep-stamp/brotli_ep-install-*.log
[ 27%] Completed 'brotli_ep'
[ 27%] Built target brotli_ep
-- jemalloc_ep build command succeeded.  See also /arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-build-*.log
[ 27%] Performing install step for 'jemalloc_ep'
-- jemalloc_ep install command succeeded.  See also /arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-install-*.log
[ 28%] Completed 'jemalloc_ep'
[ 28%] Built target jemalloc_ep
-- thrift_ep build command succeeded.  See also /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build-*.log
[ 28%] Performing install step for 'thrift_ep'
-- thrift_ep install command succeeded.  See also /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-install-*.log
[ 28%] Completed 'thrift_ep'
[ 28%] Built target thrift_ep
make: *** [Makefile:141: all] Error 2

Do you know why?

@jensenity

This comment has been minimized.

Copy link

@jensenity jensenity commented Nov 25, 2019

And these are my arguments. Do I have to update the ARROW_SHA1?

ARG ARROW_VERSION=0.15.1
ARG ARROW_SHA1=2ede75769e12df972f0acdfddd53ab15d11e0ac2
ARG ARROW_BUILD_TYPE=release

ENV ARROW_HOME=/usr/local \
    PARQUET_HOME=/usr/local
@uros-r

This comment has been minimized.

Copy link

@uros-r uros-r commented Dec 12, 2019

0.14.0 builds fine for me - however have hit the same issue trying to build 0.15.1.

@jensenity have you been able to get past this since reporting?

@uros-r

This comment has been minimized.

Copy link

@uros-r uros-r commented Dec 14, 2019

Haven't bottomed out the flatbuffers_ep issue with 0.15.1, but building PyArrow from master using the above approach worked fine.

@xiaodaigh

This comment has been minimized.

Copy link

@xiaodaigh xiaodaigh commented Dec 31, 2019

Is there an instruction of compiling from scratch but aimed at using it from R? After compiling it, I can use it in R still.

@beyoung

This comment has been minimized.

Copy link

@beyoung beyoung commented Jan 15, 2020

FROM python:3.6-alpine
RUN apk update \
    && apk upgrade \
    && apk add --no-cache build-base \
            cmake \
            bash \
            boost-dev \
            autoconf \
            zlib-dev \
            libressl-dev \
            flex \
            bison \
    && pip install six pytest numpy cython pandas 

ARG ARROW_BUILD_TYPE=release

ENV ARROW_HOME=/usr/local \
    PARQUET_HOME=/usr/local 

RUN mkdir -p /arrow \
    && apk add --no-cache curl \
    && curl -o /tmp/apache-arrow.zip -SL https://codeload.github.com/apache/arrow/zip/master \
    && unzip /tmp/apache-arrow.zip \
    && mv arrow-master/* /arrow/ \
    && mkdir -p /arrow/cpp/build \
    && cd /arrow/cpp/build \
    && cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
          -DOPENSSL_ROOT_DIR=/usr/local/ssl \
          -DCMAKE_INSTALL_LIBDIR=lib \
          -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
          -DARROW_WITH_BZ2=ON \
          -DARROW_WITH_ZLIB=ON \
          -DARROW_WITH_ZSTD=ON \
          -DARROW_WITH_LZ4=ON \
          -DARROW_WITH_SNAPPY=ON \
          -DARROW_PARQUET=ON \
          -DARROW_PYTHON=ON \
          -DARROW_PLASMA=ON \
          -DARROW_BUILD_TESTS=OFF \
          .. \
    && make -j$(nproc) \
    && make install \
    && cd /arrow/python \
    && python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet \
    && python setup.py install \
    && rm -rf /arrow /tmp/apache-arrow.tar.gz

This works to build arrow 0.15.1

@JackLeo

This comment has been minimized.

Copy link

@JackLeo JackLeo commented May 28, 2021

For 3.0.0 with 3.8 python

FROM python:3.8-alpine

RUN apk update \
    && apk upgrade \
    && apk add --no-cache build-base \
        autoconf \
        bash \
        bison \
        boost-dev \
        cmake \
        flex \
        libressl-dev \
        zlib-dev

RUN pip install --no-cache-dir six pytest numpy cython
RUN pip install --no-cache-dir pandas

ARG ARROW_VERSION=3.0.0
ARG ARROW_SHA1=c1fed962cddfab1966a0e03461376ebb28cf17d3
ARG ARROW_BUILD_TYPE=release

ENV ARROW_HOME=/usr/local \
    PARQUET_HOME=/usr/local

#Download and build apache-arrow
RUN mkdir /arrow \
    && wget -q https://github.com/apache/arrow/archive/apache-arrow-${ARROW_VERSION}.tar.gz -O /tmp/apache-arrow.tar.gz \
    && echo "${ARROW_SHA1} *apache-arrow.tar.gz" | sha1sum /tmp/apache-arrow.tar.gz \
    && tar -xvf /tmp/apache-arrow.tar.gz -C /arrow --strip-components 1 \
    && mkdir -p /arrow/cpp/build \
    && cd /arrow/cpp/build \
    && cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
        -DOPENSSL_ROOT_DIR=/usr/local/ssl \
        -DCMAKE_INSTALL_LIBDIR=lib \
        -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
        -DARROW_WITH_BZ2=ON \
        -DARROW_WITH_ZLIB=ON \
        -DARROW_WITH_ZSTD=ON \
        -DARROW_WITH_LZ4=ON \
        -DARROW_WITH_SNAPPY=ON \
        -DARROW_PARQUET=ON \
        -DARROW_PYTHON=ON \
        -DARROW_PLASMA=ON \
        -DARROW_BUILD_TESTS=OFF \
        .. \
    && make -j$(nproc) \
    && make install \
    && cd /arrow/python \
    && python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet \
    && python setup.py install \
    && rm -rf /arrow /tmp/apache-arrow.tar.gz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment