Skip to content

Instantly share code, notes, and snippets.

@michaelaye
Created November 5, 2012 08:31
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save michaelaye/4016025 to your computer and use it in GitHub Desktop.
pandas parsing bug?
import pandas
from StringIO import StringIO
s = '"09-Apr-2012", "01:10:18.300", 2456026.548822908, 12849, 1.00361, 1.12551, 330.65659, 0355626618.16711, 73.48821, 314.11625, 1917.09447, 179.71425, 80.000, 240.000, -350, 70.06056, 344.98370, 1, 1, -0.689265, -0.692787, 0.212036, 14.7674, 41.605, -9999.0, -9999.0, -9999.0, -9999.0, -9999.0, -9999.0, 000, 012, 128'
sfile = StringIO(s)
# it's 33 columns
pandas.io.parsers.read_csv(sfile, names=range(33), na_values=['-9999.0'])[29]
@michaelaye
Copy link
Author

running this provides you with
0 -9999
Name: 29

while I expect:

0 NaN
Name: 29

@michaelaye
Copy link
Author

i think i know why this happens: the data has undetermined amount of space around, which can be caught by using a csv.dialect object, which can be created by using csv.Sniffer().sniff(fobject, 2048)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment