Skip to content

Instantly share code, notes, and snippets.

@happygiraffe
Created December 10, 2011 22:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save happygiraffe/1456701 to your computer and use it in GitHub Desktop.
Save happygiraffe/1456701 to your computer and use it in GitHub Desktop.
Encoding test
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.charset.Charset;
public class Main {
public static void main(String[] args) throws IOException {
System.out.println("file.encoding=" + System.getProperty("file.encoding"));
System.out.println("sun.jnu.encoding=" + System.getProperty("sun.jnu.encoding"));
// U+0100 == LATIN CAPITAL LETTER A WITH MACRON
String filename = "test - \u0100dam";
File outputFile = new File(filename);
// Create a file with a UTF-8 name.
FileOutputStream fos = new FileOutputStream(outputFile);
try {
byte[] contents = filename.getBytes(Charset.forName("UTF-8"));
fos.write(contents);
} finally {
if (fos != null) {
fos.close();
}
}
System.out.println("Created \"" + filename + "\"");
}
}
@happygiraffe
Copy link
Author

Here's a sample run on my mac (OS 10.6).

% java -cp bin Main 
file.encoding=MacRoman
sun.jnu.encoding=MacRoman
Created "test - ?dam"
% ls
bin/        src/        test - Ādam
% ls |xxd
0000000: 6269 6e2f 0a73 7263 2f0a 7465 7374 202d  bin/.src/.test -
0000010: 2041 cc84 6461 6d0a                       A..dam.

So, the output says "?" as the character can't be represented in MacRoman, but the filename is correctly interpreted as UTF-8.

Now, trying again with file.encoding:

% rm test*
% java -Dfile.encoding=UTF-8 -cp bin Main
file.encoding=UTF-8
sun.jnu.encoding=MacRoman
Created "test - Ādam"
% ls
bin/        src/        test - Ādam
% ls |xxd
0000000: 6269 6e2f 0a73 7263 2f0a 7465 7374 202d  bin/.src/.test -
0000010: 2041 cc84 6461 6d0a                       A..dam.

And once more with sun.jnu.encoding.

% rm test*
% java -Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8 -cp bin Main
file.encoding=UTF-8
sun.jnu.encoding=UTF-8
Created "test - Ādam"
% ls |xxd
0000000: 6269 6e2f 0a73 7263 2f0a 7465 7374 202d  bin/.src/.test -
0000010: 2041 cc84 6461 6d0a                       A..dam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment