Last active
February 9, 2023 10:17
-
-
Save thisisnic/af265166d5cd1ebce605cf3e478ee6d8 to your computer and use it in GitHub Desktop.
Example for user of how a combination of blank values may result in error reading a CSV
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
``` r | |
library(arrow) | |
library(dplyr) | |
library(stringr) | |
tf <- tempfile() | |
# values to save - note the space after the final new line | |
dodgy_vals <- "x,y\n0,1\n ,4" | |
cat(dodgy_vals) | |
#> x,y | |
#> 0,1 | |
#> ,4 | |
writeLines(dodgy_vals, tf) | |
open_dataset(tf, format = "csv", schema = schema(x = int64(), y = int64()), skip = 1) %>% | |
collect() | |
#> Error in `compute.Dataset()`: | |
#> ! Invalid: Could not open CSV input source '/tmp/RtmpF2Lxpf/file8b1d439d4116b': Invalid: In CSV column #0: Row #3: CSV conversion error to int64: invalid value '' | |
#> ℹ If you have supplied a schema and your data contains a header row, you should supply the argument `skip = 1` to prevent the header being read in as data. | |
open_dataset(tf, format = "csv", schema = schema(x = int64(), y = int64()), skip = 1, null_values = c(NA, " ", "")) %>% | |
collect() | |
#> # A tibble: 2 × 2 | |
#> x y | |
#> <int> <int> | |
#> 1 0 1 | |
#> 2 NA 4 | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment