Created
July 18, 2019 03:10
-
-
Save jordansamuels/d69f1c22c58418f5dfa0785b9ecd211e to your computer and use it in GitHub Desktop.
Python script to reproduce pyarrow 0.14.0 only reading part of a valid gzip csv file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import pyarrow | |
import pyarrow.csv as pcsv | |
import os | |
pd.DataFrame({'x': [1]}).to_csv('/tmp/1.csv.gz', index=False, compression='gzip') | |
pd.DataFrame({'x': [2]}).to_csv('/tmp/2.csv.gz', header=False, index=False, compression='gzip') | |
os.system("cat /tmp/1.csv.gz /tmp/2.csv.gz > /tmp/t.csv.gz") | |
print("pyarrow.csv only reads one row:") | |
print(pcsv.read_csv('/tmp/t.csv.gz').to_pandas()) | |
print("pandas reads two rows:") | |
print(pd.read_csv('/tmp/t.csv.gz')) | |
print("pyarrow version: " + pyarrow.__version__) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Output is: