Skip to content

Instantly share code, notes, and snippets.

@PurpleBooth
Last active August 29, 2015 14:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PurpleBooth/f1c33cc83ab0dd4eaacc to your computer and use it in GitHub Desktop.
Save PurpleBooth/f1c33cc83ab0dd4eaacc to your computer and use it in GitHub Desktop.
<?php
if(count($argv) != 2) {
echo "Attempt to recursively convert a directory of XML files to UTF-8 and Fix any unescaped & symbols\n";
echo "Usage {$argv[0]} directory\n";
exit(1);
}
$path = realpath($argv[1]);
$objects = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path), RecursiveIteratorIterator::SELF_FIRST);
foreach($objects as $name => $object){
if(mb_strtolower($object->getExtension()) == 'xml') {
$contents = file_get_contents($object->getRealPath());
$encoding = mb_detect_encoding($contents);
if(!$encoding) {
$encoding = 'Unknown';
}
echo "{$name}: Probably ". $encoding . " converting to UTF-8\n";
$decoded = mb_convert_encoding($contents, 'UTF-8');
$regex = "/(?!<\\!\\[CDATA\\[)(?:&([^;\\W]*([^;\\w]|$)))(?!\\]\\]>)/";
$replacement = "&amp;$1";
$decoded = preg_replace($regex, $replacement, $decoded);
file_put_contents($object->getRealPath(), $decoded);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment