Skip to content

Instantly share code, notes, and snippets.

@animeshtrivedi
Last active November 27, 2019 16:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save animeshtrivedi/7ba14e77f1ac97f6ff96e88aa26e3e8d to your computer and use it in GitHub Desktop.
Save animeshtrivedi/7ba14e77f1ac97f6ff96e88aa26e3e8d to your computer and use it in GitHub Desktop.
This patch deos the following changes:
* moves two common function "getNullCount" and "splitAndTransferValidityBuffer" to the top-level BaseValueVector. This change requries moving "validityBuffer" to the BaseValueVector class (as recommended in this TODO: https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L89)
* optimize the implementation of loadValidityBuffer (in the BaseValueVector) to just pass the reference for the validity buffer read from the storage
* optimize for the common boundary condition when all variables are valid (as done in the C++ code: https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L290)
The optimization delivers performance.
Tests: Read 50M integers from a single Int column (2GB).
Before the patch:
Baseline: 7.64 Gb/sec
With the Holder API : 9.99 Gb/sec
After the patch (with the bitmap condition checks)
Baseline: 12.13 Gb/sec (+58.7% gains)
With the Holder API: 16.03 Gb/sec (+60.4% gains)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment