Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
How to use AntiSamy-java from a node.js application to sanitize HTML
var java = require('java');
var util = require('util');
// -------------------------------------------------
// How to use AntiSamy from Node to sanitize HTML
// -------------------------------------------------
// sanitizing HTML input is a science in itself. better not to reinvent the wheel
// the AntiSamy main website is
// I used the Java version of AntiSamy, and the latest from bridge interface.
// Thanks JoeFerner
// I used the 'Sync' versions of all the node-java calls for simplicity. it looks to me like
// the only IO happens when you load the policy and get the policy and AntiSamy instance, therefore
// the scanning can be synchronous. if you want, load the policy and antisamy instance async
// to download and install AntiSamy, go to
// download the Develope_Guide.pdf and follow its instructions
// I used the following required jar files in the most recent versions
java.classpath.push('batik-css.jar'); // from > batik-1.7/lib/batik-css.jar
java.classpath.push('xercesImpl.jar'); // from > xerces-2_11_0/xercesImpl.jar
java.classpath.push('xml-apis.jar'); // from > xerces-2_11_0/xml-apis.jar
java.classpath.push('antisamy-1.4.4.jar'); // from the antisamy download
java.classpath.push('nekohtml.jar'); // from
var i;
// the string to scan
var s = "<p><script>alert();</script></p><span></span>";
// get a policy object loaded with the desired policy
var policy = java.callStaticMethodSync("org.owasp.validator.html.Policy","getInstance","antisamy-slashdot-1.4.4.xml");
// get an instance of the AntiSamy class
var as = java.newInstanceSync("org.owasp.validator.html.AntiSamy");
// scan the string
var cr = as.scanSync(s,policy);
// get the sanitized HTML
var html = cr.getCleanHTMLSync();
// get a Java ArrayList of the messages generated by the scan
var elist = cr.getErrorMessagesSync();
// get the length of the ArrayList
var len = elist.sizeSync();
// display the sanitized HTML
// display scan messages
for(i=0;i<len;++i) {
/* Output for the string "<p><script>alert();</script></p><span></span>"
<p />
The script tag is not allowed for security reasons. This tag should not affect the display of the input.
The span tag was empty, and therefore we could not process it. The rest of the message is intact, and its removal should not have any side effects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.