Skip to content

Instantly share code, notes, and snippets.

@andris9
Created August 2, 2010 19:33
Show Gist options
  • Save andris9/505185 to your computer and use it in GitHub Desktop.
Save andris9/505185 to your computer and use it in GitHub Desktop.
/*
* Strip unwanted HTML tags and unwanted attributes
*/
limitHTML = function(html){
var re = new RegExp("^(a|b|blockquote|code|del|dd|dl|dt|em|h1|h2|h3||i|img|li|ol|p|pre|sup|sub|strong|strike|ul|br|hr)$"),
attribs = {
"img": new RegExp("^(src|width|height|alt|title)$"),
"a": new RegExp("^(href|title)$")
}
html = html.replace(/<(\/?)\s*([\w:\-]+)([^>]*)>/g,function(original, lslash, tag, params){
var attr, rslash = params.substr(-1)=="/" && "/" || "";
tag = tag.toLowerCase();
if(!tag.match(re))
return "";
else if(attr = attribs[tag]){
params = params.trim()
if(rslash){
params = params.substr(0, params.length-1);
}
params = params.replace(/(\s*)([\w:-]+)\s*=\s*(["'])([^\3]+?)(?:\3)/g, function(original, space, name, quot, value){
name = name.toLowerCase();
if(name=="href" && value.trim().substr(0,"javascript:".length)=="javascript:")
value = "#";
return name.match(attr)?space+name+"="+quot+value+quot:"";
});
return "<"+lslash+tag+(params?" "+params:"")+rslash+">";
}else
return "<"+lslash+tag+rslash+">";
});
return html;
}
@andris9
Copy link
Author

andris9 commented Aug 2, 2010

sidenote: "the job that needs to get done" is stripping unwanted HTML tags from an HTML string. re holds all the allowed tags and attribs holds allowed attributes for specific tags. If an attribute is not in the allowed list, it gets deleted (good way to get rid of unwanted onclick etc event handlers). Additionally href="javascript:..." is also replaced with href="#"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment