jmcastagnetto/README.md

## README.md

      
    Raw
  

              README.md
            
          
    When using rio::import() from a CSV data gets truncated at the first line with an error, in the example case described here, a different number of columns.
This does not happen when using readr::read_csv()
Dataset:

Source: Data on Dengue surveillance in Peru, https://www.datosabiertos.gob.pe/dataset/vigilancia-epidemiol%C3%B3gica-de-dengue

CSV file: https://www.datosabiertos.gob.pe/sites/default/files/datos_abiertos_vigilancia_dengue.csv


This dataset has 501,692 rows (as of 2024-04-24)


## test-rio-readr.R
library(rio)
packageVersion("rio")
# [1] ‘1.0.1’

d1 <- import("datos_abiertos_vigilancia_dengue.csv")
# Warning message:
# In (function (input = "", file = NULL, text = NULL, cmd = NULL,  :
#   Stopped early on line 87871. Expected 14 fields but found 16. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<PIURA,TALARA,PARIѐAS,ENACE I\,II\,III,DENGUE SIN SEÑALES DE ALARMA,2009,9,A97.0,31,200701,2007010008,23,A,F>>

nrow(d1)
# [1] 87869

library(readr)
packageVersion("readr")
# [1] ‘2.1.5’

d2 <- read_csv("datos_abiertos_vigilancia_dengue.csv")
# Rows: 501692 Columns: 14
# ── Column specification ─────────────────────────────────────────────────────────────
# Delimiter: ","
# chr (11): departamento, provincia, distrito, localidad, enfermedad, diagnostic, d...
# dbl  (3): ano, semana, edad
#
# ℹ Use `spec()` to retrieve the full column specification for this data.
# ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Warning message:
# vOne or more parsing issues, call `problems()` on your data frame for details, e.g.:
#   dat <- vroom(...)
#   problems(dat)
nrow(d2)
# [1] 501692
	library(rio)
	packageVersion("rio")
	# [1] ‘1.0.1’

	d1 <- import("datos_abiertos_vigilancia_dengue.csv")
	# Warning message:
	# In (function (input = "", file = NULL, text = NULL, cmd = NULL, :
	# Stopped early on line 87871. Expected 14 fields but found 16. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<PIURA,TALARA,PARIѐAS,ENACE I\,II\,III,DENGUE SIN SEÑALES DE ALARMA,2009,9,A97.0,31,200701,2007010008,23,A,F>>

	nrow(d1)
	# [1] 87869

	library(readr)
	packageVersion("readr")
	# [1] ‘2.1.5’

	d2 <- read_csv("datos_abiertos_vigilancia_dengue.csv")
	# Rows: 501692 Columns: 14
	# ── Column specification ─────────────────────────────────────────────────────────────
	# Delimiter: ","
	# chr (11): departamento, provincia, distrito, localidad, enfermedad, diagnostic, d...
	# dbl (3): ano, semana, edad
	#
	# ℹ Use `spec()` to retrieve the full column specification for this data.
	# ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
	# Warning message:
	# vOne or more parsing issues, call `problems()` on your data frame for details, e.g.:
	# dat <- vroom(...)
	# problems(dat)
	nrow(d2)
	# [1] 501692