Last active
November 27, 2019 16:25
-
-
Save animeshtrivedi/7ba14e77f1ac97f6ff96e88aa26e3e8d to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This patch deos the following changes: | |
* moves two common function "getNullCount" and "splitAndTransferValidityBuffer" to the top-level BaseValueVector. This change requries moving "validityBuffer" to the BaseValueVector class (as recommended in this TODO: https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L89) | |
* optimize the implementation of loadValidityBuffer (in the BaseValueVector) to just pass the reference for the validity buffer read from the storage | |
* optimize for the common boundary condition when all variables are valid (as done in the C++ code: https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L290) | |
The optimization delivers performance. | |
Tests: Read 50M integers from a single Int column (2GB). | |
Before the patch: | |
Baseline: 7.64 Gb/sec | |
With the Holder API : 9.99 Gb/sec | |
After the patch (with the bitmap condition checks) | |
Baseline: 12.13 Gb/sec (+58.7% gains) | |
With the Holder API: 16.03 Gb/sec (+60.4% gains) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment