Skip to content

Instantly share code, notes, and snippets.

@tenman
Created February 22, 2010 00:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tenman/310635 to your computer and use it in GitHub Desktop.
Save tenman/310635 to your computer and use it in GitHub Desktop.
It escapes in characters other than the html usage. When tidy cannot be used
<?php
/**
* It escapes in characters other than the html usage.
*
*
*
*/
$doc=<<<DOC
<html lang="ja" dir="ltr">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<meta http-equiv="content-script-type" content="text/javascript">
<meta http-equiv="content-style-type" content="text/css">
<meta name="author" content="nobita">
<title>another &gt;&lt;</title>
</head>
<body>
<div id="wrapper">
<div id="head">
title
</div>
<div id="doc">
aaaaa>bbbbb
eeee<dddddd>>>><<<<<
&gt;&lt;The person who writes html will come to want to write the source.
</div>
<div id="foot">
address
</div>
</div>
</body>
</html>
...
DOC;
function escape_tag($doc){
$html4 = '(?i:title|a|abbr|acronym|address|applet|area|b|base|basefont|bdo|big|blockquote|body|br|button|caption|center|cite|code|col|colgroup|dd|del|dfn|dir|div|dl|dt|em|fieldset|font|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|hr|html|i|iframe|img|input|ins|isindex|kbd|label|legend|li|link|map|menu|meta|noframes|noscript|object|ol|optgroup|option|p|param|pre|q|s|samp|script|select|small|span|strike|strong|style|sub|sup|table|tbody|td|textarea|tfoot|th|thead|tr|tt|u|ul|var)';
$html5 = '(?i:html|head|title|base|link|meta|style|script|noscript|body|section|nav|article|aside|h1|h2|h3|h4|h5|h6|hgroup|header|footer|address|p|hr|br|pre|blockquote|ol|ul|li|dl|dt|dd|a|q|cite|em|strong|small|mark|dfn|abbr|time|progress|meter|code|var|samp|kbd|sub|sup|span|i|b|bdo|ruby|rt|rp|ins|del|figure|figcaption|img|iframe|embed|object|param|video|audio|source|canvas|map|area|table|caption|colgroup|col|tbody|thead|tfoot|tr|td|th|form|fieldset|label|input|button|select|datalist|optgroup|option|textarea|keygen|output|details|summary|command|menu|legend|div)';
//改行削除
$doc = str_replace("\n","",$doc);
$doc_escape = str_replace(array("<",">"),array("&#60;","&#62;"),$doc);
//保存
file_put_contents("result.txt",$doc_escape);
preg_replace_callback("!(</?$html4)(>|\s[^>]+>)!",'search_tag',$doc);
//表示
echo file_get_contents("result.txt");
}
function search_tag($matches){
$doc2 = file_get_contents("result.txt");
//エスケープしたドキュメントからタグを検索するためのキー
$key = str_replace(array("<",">"),array("&#60;","&#62;"),$matches[0]);
//タグ部分をhtmlに書き戻し
$doc2 = str_replace($key,$matches[0],$doc2);
file_put_contents("result.txt",$doc2);
}
escape_tag($doc);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment