When using rio::import()
from a CSV data gets truncated at the first line with an error, in the example case described here, a different number of columns.
This does not happen when using readr::read_csv()
Dataset:
- Source: Data on Dengue surveillance in Peru, https://www.datosabiertos.gob.pe/dataset/vigilancia-epidemiol%C3%B3gica-de-dengue
- This dataset has 501,692 rows (as of 2024-04-24)
@jmcastagnetto If
readr::read_csv()
can do what you like it to behave, then use it. As stated infread()
's help file: "fread is for regular delimited files; i.e., where every row has the same number of columns.". That's real life (bad) data after all, teaching students to try different tools is not a bad idea. But as @schochastics said, one can argue whetherread::read_csv()
is actually doing the right thing.Perhaps you've already known the issue, but I reiterate it anyway. To extract the essence of the problem:
The file uses "," as separator, but at the same time use "slash ," to try to escape "," in cases like "ENACE I,II,III".
fread
'ssep
parameter can accept regex but AFAIK, it's difficult to express ", but not slash ," in regex.What I would do usually is to just apply some standard Unix things:
I think this is perhaps the "correct" way, at least it is not hiding the problem likes
readr::read_csv()
's default. But I would also agree it is difficult to teach beginners to do these things. It's also OS dependent.If one prefers a pure R solution and use nothing but
rio
and base functions: