Skip to content

Instantly share code, notes, and snippets.

@kentbrew
Last active June 2, 2022 19:49
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kentbrew/afa2a73947b02d20a5355268ec1654c2 to your computer and use it in GitHub Desktop.
Save kentbrew/afa2a73947b02d20a5355268ec1654c2 to your computer and use it in GitHub Desktop.
Using DOMParser to clean HTML input
<!doctype html>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<p>I wrote this because I find cleaning strings with regular expressions psychically unfulfilling.</p>
<table border="1" cellpadding="10" cellspacing="0">
<tr><td>Dirty</td><td>Clean</td></tr>
</table>
<script>
var test = [
'%E6%B2%B3%E8%B1%9A',
'&lt;script>alert("&#21488;&#21271;");&lt;/script>',
'<these> <are /> not valid <html> tags</these>',
'&#X000003C;script>alert("ding");&#X000003C;/script>',
'\x3Cscript>alert 1;\x3C/script> how now brown cow',
'&lt;img src=x onerror="alert(0)" />',
'javascript:alert("ding");'
];
var clean = function (input) {
var testMe = input, dupeTest = '';
while (testMe !== dupeTest) {
testMe = new DOMParser().parseFromString(testMe, "text/html").documentElement.textContent;
dupeTest = testMe;
}
testMe = testMe.replace(/</g, '&lt;');
return testMe;
}
var table = document.getElementsByTagName('TABLE')[0];
for (var i = 0; i < test.length; i = i + 1) {
var tr = document.createElement('TR');
var dirty = document.createElement('TD');
dirty.textContent = test[i];
tr.appendChild(dirty);
var cleaned = document.createElement('TD');
cleaned.innerHTML = clean(test[i]);
tr.appendChild(cleaned);
table.appendChild(tr);
}
</script>
</body>
</html>

Old School:

  var clean = function (input) {
    var testMe = input, dupeTest = '';
    while (testMe !== dupeTest) {
      testMe = new DOMParser().parseFromString(testMe, "text/html").documentElement.textContent;
      dupeTest = testMe;
    }
    testMe = testMe.replace(/</g, '&lt;');
    return testMe;
  }

ES6 Hotness:

let clean = input => {
  let testMe = input, dupeTest = '';
  while (testMe !== dupeTest) {
    testMe = new DOMParser().parseFromString(testMe, "text/html").documentElement.textContent;
    dupeTest = testMe;
  }
  testMe = testMe.replace(/</g, '&lt;');
  return testMe;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment