Created
August 26, 2020 06:41
-
-
Save kiszk/daedf1e033a07d3f484655a69aac78ca to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Micah, thank you for expanding the scope for Big Endian support in Arrow. I am happy to see this when I am back from one-week vacation. I agree with this since we have just seen the kickoff of BE support in Go. Let me make comments | |
> 1. As long as there is CI in place to catch regressions (right now I think the CI is fairly unreliable?) | |
I agree. While TravisCI on s390x are unreliable, each platform can set up CI script. | |
> 2. No degradation in performance for little-endian architectures (verified by additional micro benchmarks) | |
Yes, we can do it. @kou suggested me to use the existing mechanism like [1]. Now, it supports C++.We could expand this to other languages. | |
> 3. Not a large amount of invasive code to distinguish between platforms. | |
Yes, I will prepare a draft of the principle for endian-independent code in another mail. | |
> Kazuaki Ishizaki I asked question previously, but could you give some data points around: | |
Sorry for the delay. I was waiting for comments during my one-week vacation. | |
> 1. The current state of C++ support (how much code needed to change)? | |
In my understanding, the PR [2] is the last PR to support big-endian for intra-process and inter-process in C++ except Parquet support. | |
> 2. How many more PRs you expect to need for Java (and approximate size)? | |
I have just submitted two PRs [5, 6] today to pass the current test using “mvn install”. The total size of the four PRs [3, 4, 5, 6] are around 200 lines. For CI for Java, I submitted one PR [7] (~100 lines). | |
In addition, I will submit another PR (~500 lines) to pass integration test. | |
BTW, I think that it would be good to add test cases to increase test coverage for little-endian and big-endian (e.g. increase the types of a schema to be tested in flight-core). If it is acceptable, we can submit more PRs regardless of endianness. | |
[1] https://github.com/apache/arrow/pull/7940#issuecomment-672690540 | |
[2] https://github.com/apache/arrow/pull/7507 | |
[3] https://github.com/apache/arrow/pull/7942 | |
[4] https://github.com/apache/arrow/pull/7944 | |
[5] https://github.com/apache/arrow/pull/8056 | |
[6] https://github.com/apache/arrow/pull/8057 | |
[7] https://github.com/apache/arrow/pull/7938 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment