Skip to content

Instantly share code, notes, and snippets.

@chrissyhroberts
Created August 18, 2020 10:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chrissyhroberts/db495a94c499c65d70deeccbd465b15e to your computer and use it in GitHub Desktop.
Save chrissyhroberts/db495a94c499c65d70deeccbd465b15e to your computer and use it in GitHub Desktop.
library(utf8)
# second entry is encoded in latin-1, but declared as UTF-8
x <- c("fa\u00E7ile", "fa\xE7ile", "fa\xC3\xA7ile")
Encoding(x) <- c("UTF-8", "UTF-8", "bytes")
as_utf8(x) # fails
#> Error in as_utf8(x): entry 2 has wrong Encoding; marked as "UTF-8" but leading byte 0xE7 followed by invalid continuation byte (0x69) at position 4
# mark the correct encoding
Encoding(x[2]) <- "latin1"
as_utf8(x) # succeeds
#> [1] "façile" "façile" "façile"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment