When extracting .zip
files with Japanese encoding using p7zip in UTF-8 locale, a double-encoded file name is created:
\u0082±\u0082ñ\u0082É\u0082¿\u0082Í\u0081I.txt
Upon closer inspection, some code-points are > 127 and all of them < 256.
> filename.codepoints
=> [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175, 239, 188, 129, 46, 116, 120, 116]
They can be…
> filename.codepoints
.pack('c*') # ...interpreted as bytes
.force_encoding('CP932') # ...of CP932 encoding
.encode('UTF-8')
=> "こんにちは!.txt"