Created
November 9, 2018 10:28
-
-
Save xhochy/2f0b497e474f4c97031e1e92e9ca7a76 to your computer and use it in GitHub Desktop.
@dclong I'm also seeing this behaviour with the newest Arrow release. Did you open an upstream bug about this already?
No. I figured out that the default configuration has changed. So, one way to fix the issue is to customize the underlying configuration.
This looks awesome and I am trying to implement this but I run into errors. Like @dclong , I am trying to connect to hive. I am successfully able to get the "batch" (VectorSchemeRoot object). However, when I try to pass the VectorSchemeRoot through pyarrow.jvm.record_batch
. It gives me the following error: expected bytes, java.lang.String found
.
Have either of you tried this gist out recently? Would you know what the issue might be?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I followed your example here to use pyarrow.jvm to query a Hive database.
However,
after running the following code
it returns only 1024 row.
Basically,
batch.getRowCount()
return 1024. The table I queries is a huge which has way more than 1024 rows.Do you have an idea what might have caused the issue?
Do I have to use customized configuration?