Skip to content

Instantly share code, notes, and snippets.

@niyue
Last active September 13, 2022 09:48
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save niyue/aeb0fb2378ffa8580e47c7d5bd44b6d4 to your computer and use it in GitHub Desktop.
Save niyue/aeb0fb2378ffa8580e47c7d5bd44b6d4 to your computer and use it in GitHub Desktop.
develop pyarrow with visual studio code remote containers
  1. clone arrow git repo locally (under a macOS in my case)
  2. open arrow/python folder with vscode
  3. create a Dockerfile for pyarrow development. pyarrow provides such Dockerfile, so you only need to do some linking ln -s examples/minimal_build/Dockerfile.ubuntu Dockerfile
  4. Follow the steps here to open a remote container as dev env in vscode
    1. Run "Remote-Containers: Open Folder in Container..." in vscode
    2. Choose Dockerfile you just linked
    3. vscode will build the container and open it later, just wait
  5. Enter the container for building pyarrow
    1. docker exec -it <container_id> bash
    2. vscode syncs all files under /workspaces folder, change WORKDIR env variable to /workspaces
      1. export WORKDIR=/workspaces && export ARROW_ROOT=$WORKDIR/arrow
    3. (optional) Edit build script to use the synced folder
      1. Edit /workspaces/arrow/python/examples/minimal_build/build_venv.sh
      2. ARROW_ROOT=${ARROW_ROOT:-/arrow}
  6. Build with the build script ./build_venv.sh
  7. If you want to edit pyarrow files without re-building cpp files
    export WORKDIR=/workspaces
    export LIBRARY_INSTALL_DIR=$WORKDIR/local-libs
    export CPP_BUILD_DIR=$WORKDIR/arrow-cpp-build
    export ARROW_HOME=$WORKDIR/dist
    export ARROW_ROOT=$WORKDIR/arrow
    export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH
    export PYARROW_BUILD_TYPE=Debug
    export PYARROW_CMAKE_GENERATOR=Ninja
    export PYARROW_WITH_PARQUET=1
    export PYARROW_WITH_FLIGHT=1 # if arrow flight is needed
    export ARROW_TEST_DATA="$ARROW_ROOT/testing/data"
    export PARQUET_TEST_DATA="$ARROW_ROOT/cpp/submodules/parquet-testing/data"
    
    pip install -r $ARROW_ROOT/python/requirements-build.txt
    1. run the pyarrow build
    pushd arrow/python
    python setup.py develop
    1. run pyarrow test
    pip install -r $ARROW_ROOT/python/requirements-test.txt
    py.test pyarrow
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment