Skip to content

Instantly share code, notes, and snippets.

@thisisnic
Last active February 9, 2023 10:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save thisisnic/af265166d5cd1ebce605cf3e478ee6d8 to your computer and use it in GitHub Desktop.
Save thisisnic/af265166d5cd1ebce605cf3e478ee6d8 to your computer and use it in GitHub Desktop.
Example for user of how a combination of blank values may result in error reading a CSV
``` r
library(arrow)
library(dplyr)
library(stringr)
tf <- tempfile()
# values to save - note the space after the final new line
dodgy_vals <- "x,y\n0,1\n ,4"
cat(dodgy_vals)
#> x,y
#> 0,1
#> ,4
writeLines(dodgy_vals, tf)
open_dataset(tf, format = "csv", schema = schema(x = int64(), y = int64()), skip = 1) %>%
collect()
#> Error in `compute.Dataset()`:
#> ! Invalid: Could not open CSV input source '/tmp/RtmpF2Lxpf/file8b1d439d4116b': Invalid: In CSV column #0: Row #3: CSV conversion error to int64: invalid value ''
#> ℹ If you have supplied a schema and your data contains a header row, you should supply the argument `skip = 1` to prevent the header being read in as data.
open_dataset(tf, format = "csv", schema = schema(x = int64(), y = int64()), skip = 1, null_values = c(NA, " ", "")) %>%
collect()
#> # A tibble: 2 × 2
#> x y
#> <int> <int>
#> 1 0 1
#> 2 NA 4
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment