Skip to content

Instantly share code, notes, and snippets.

@nemo-kaz
Created January 14, 2016 23:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nemo-kaz/df122e75e6b8817f9ef7 to your computer and use it in GitHub Desktop.
Save nemo-kaz/df122e75e6b8817f9ef7 to your computer and use it in GitHub Desktop.
// 文字コード判定
// ソースコードのコードページを再帰的に判定し続ける
import com.ibm.icu.text.CharsetDetector
@Grab(group='com.ibm.icu', module='icu4j', version='56.1')
def detector = new CharsetDetector()
// UTF-8, UTF-16 UTF-32, Windows-31j
// ISO-8859-2, windows-1252, windows-1250, ISO-8859-2, ISO-8859-1, Big5, UTF-16LE
new File(".").eachFileRecurse { file ->
if(file.isFile() &&
(file.name.endsWith("TXT")||
file.name.endsWith("RPGLE")||
file.name.endsWith("CLP")||
file.name.endsWith("PF")||
file.name.endsWith("LF")||
file.name.endsWith("DSPF")||
file.name.endsWith("PRTF")||
file.name.endsWith("RPGLE")||
file.name.endsWith("cpy")||
file.name.endsWith("txt")||
file.name.endsWith("java")||
file.name.endsWith("text")||
file.name.endsWith("cbl")||
file.name.endsWith("jcl")) ) {
curName = file.getPath().replaceAll(/.\\(.*)/) {m0,m1 -> m1}
bytes1 = new File(curName).getBytes()
fileCodepage = detector.setText(bytes1).detect().getName()
if (fileCodepage =~ (/ISO-8859-1|ISO-8859-2|windows-1250|windows-1252|Big5|UTF-16LE/)) {
if(!(file.text =~ /^[^ -~。-゚]/)) {print "NoKanji "}
}
print fileCodepage +"\t"
println file.getAbsolutePath().minus(".\\")
}
}
@nemo-kaz
Copy link
Author

Detects codepage of source code recursively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment