Skip to content

Instantly share code, notes, and snippets.

@wch
Created December 18, 2015 15:31
Show Gist options
  • Save wch/3a629cfe575846a14207 to your computer and use it in GitHub Desktop.
Save wch/3a629cfe575846a14207 to your computer and use it in GitHub Desktop.
Multibyte non-UTF-8 locales
# ==== Creating UTF-8 strings ====
# This is how to create a string with UTF-8 encoding. This should work
# regardless of the current locale settings.
x <- rawToChar(as.raw(c(0xe5, 0x8d, 0x88)))
Encoding(x) <- "UTF-8"
x
# [1] "午"
# Another string, 'Δ★😎'
pat <- rawToChar(as.raw(c(0xce, 0x94, 0xe2, 0x98, 0x85, 0xf0, 0x9f, 0x98, 0x8e)))
Encoding(pat) <- "UTF-8"
cat(pat)
# Δ★😎
# =======================
# Setting locale
# =======================
# By default, Mac and Linux use UTF-8 encodings, but sometimes it's useful to
# use a multibyte, non-UTF-8 locale for testing.
# ==== Mac ====
# On a Mac, you can use a UTF-8 locale like so. (This is default setting for US
# English).
Sys.setlocale("LC_ALL", "en_US.UTF-8")
# To use a multibyte non-UTF-8 locale:
Sys.setlocale("LC_ALL", "ja_JP.SJIS")
# ==== Ubuntu ====
# On Ubuntu, you need to enable a multibyte non-UTF-8 locale, like ja_JP.EUC-JP.
# As root, put the following in a new file /var/lib/locales/supported.d/ja
ja_JP.UTF-8 UTF-8
ja_JP.EUC-JP EUC-JP
# After adding that file, run:
sudo dpkg-reconfigure locales
# Restart R if it's already running. Then you can run the same R code as above,
# but with the ja_JP.EUC-JP locale:
Sys.setlocale("LC_ALL", "ja_JP.EUC-JP")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment