public
Created

pandas parsing bug?

  • Download Gist
gistfile1.py
Python
1 2 3 4 5 6 7 8
import pandas
from StringIO import StringIO
 
s = '"09-Apr-2012", "01:10:18.300", 2456026.548822908, 12849, 1.00361, 1.12551, 330.65659, 0355626618.16711, 73.48821, 314.11625, 1917.09447, 179.71425, 80.000, 240.000, -350, 70.06056, 344.98370, 1, 1, -0.689265, -0.692787, 0.212036, 14.7674, 41.605, -9999.0, -9999.0, -9999.0, -9999.0, -9999.0, -9999.0, 000, 012, 128'
 
sfile = StringIO(s)
# it's 33 columns
pandas.io.parsers.read_csv(sfile, names=range(33), na_values=['-9999.0'])[29]

running this provides you with
0 -9999
Name: 29

while I expect:

0 NaN
Name: 29

i think i know why this happens: the data has undetermined amount of space around, which can be caught by using a csv.dialect object, which can be created by using csv.Sniffer().sniff(fobject, 2048)

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.