dtinth/mojibake.md

## mojibake.md

      
    Raw
  

              mojibake.md
            
          
    When extracting .zip files with Japanese encoding using p7zip in UTF-8 locale, a double-encoded file name is created:
\u0082±\u0082ñ\u0082É\u0082¿\u0082Í\u0081I.txt

Upon closer inspection, some code-points are > 127 and all of them < 256.
> filename.codepoints
=> [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175, 239, 188, 129, 46, 116, 120, 116]

They can be…
> filename.codepoints
    .pack('c*')                     # ...interpreted as bytes
    .force_encoding('CP932')        # ...of CP932 encoding
    .encode('UTF-8')
=> "こんにちは！.txt"