Skip to content

Instantly share code, notes, and snippets.

@Ruzzz
Last active January 15, 2017 14:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Ruzzz/4090400 to your computer and use it in GitHub Desktop.
Save Ruzzz/4090400 to your computer and use it in GitHub Desktop.
Функция подготовки html-кода для работы с DOM — решает проблемы с кириллицей
<?php
function prepareForDOM($html, $inCharset) {
if ($inCharset != 'utf8' && $inCharset != 'utf-8')
$html = iconv($inCharset, 'UTF-8//TRANSLIT', $html);
$html = preg_replace('/<(script|style|noscript)\b[^>]*>.*?<\/\1\b[^>]*>/is', '', $html);
$tidy = new tidy;
$config = array( // See http://tidy.sourceforge.net/docs/quickref.html
'drop-font-tags' => true,
'drop-proprietary-attributes' => true,
'hide-comments' => true,
'indent' => true,
'logical-emphasis' => true,
'numeric-entities' => true,
'output-xhtml' => true,
'wrap' => 0,
//'vertical-space' => true
);
$tidy->parseString($html, $config, 'utf8');
$tidy->cleanRepair();
$html = $tidy->value;
$html = preg_replace('#<meta[^>]+>#isu', '', $html);
$html = preg_replace('#<head\b[^>]*>#isu', "<head>\r\n<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />", $html);
return $html;
};
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment