Skip to content

Instantly share code, notes, and snippets.

@yutannihilation
Last active August 29, 2015 14:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yutannihilation/820e996fc4780f4ffb9e to your computer and use it in GitHub Desktop.
Save yutannihilation/820e996fc4780f4ffb9e to your computer and use it in GitHub Desktop.
benchmarks of conversion from 全角 to 半角

Libraries and constants

library(halfwidthr)
library(Nippon)
library(stringi)
library(microbenchmark)

a <- rep(zenkaku$number, 10000)
zen <- paste0(c(zenkaku$number, zenkaku$lower, zenkaku$upper), collapse = "")
han <- paste0(c(as.character(0:9), letters, LETTERS), collapse = "")

Test1: 10000 strings of length 10

chartr() is fastest.

> microbenchmark(halfwidthen(a),
+ zen2han(a),
+ chartr(zen, han, a),
+ stri_trans_nfkc(a),
+ times = 100)
Unit: milliseconds
                expr      min        lq      mean    median        uq      max neval
      halfwidthen(a) 6.968908  7.619644  8.076640  7.962113  8.459972 10.84232   100
          zen2han(a) 5.060682  5.459276  5.931075  5.891796  6.259816  7.70020   100
 chartr(zen, han, a) 4.910458  5.440568  5.866501  5.818360  6.108197  8.58674   100
  stri_trans_nfkc(a) 9.270564 10.218115 10.739474 10.689168 11.103539 13.42656   100

Test2: 1 string of length 100000

stri_trans_nfkc() is fastest.

> microbenchmark(halfwidthen(paste0(a, collapse = "")),
+                zen2han(paste0(a, collapse = "")),
+                chartr(zen, han, paste0(a, collapse = "")),
+                stri_trans_nfkc(paste0(a, collapse = "")),
+                times = 10)
Unit: milliseconds
                                       expr        min         lq       mean     median         uq       max neval
      halfwidthen(paste0(a, collapse = ""))  727.60730  734.01301  760.20788  751.32066  770.05459  840.6556    10
          zen2han(paste0(a, collapse = "")) 1048.66796 1058.16693 1078.24718 1067.45746 1085.10437 1161.9092    10
 chartr(zen, han, paste0(a, collapse = "")) 1054.91423 1069.24883 1098.96453 1094.61130 1125.69928 1145.8041    10
  stri_trans_nfkc(paste0(a, collapse = ""))   10.44722   10.79793   11.10041   11.05495   11.37788   11.7727    10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment