Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
HTML 5 validation in Java (based on the Nu HTML Checker)
<!-- Add this in your POM -->
<dependency>
<groupId>nu.validator</groupId>
<artifactId>validator</artifactId>
<version>15.3.14</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
/**
* Verifies that a HTML content is valid.
* @param htmlContent the HTML content
* @return true if it is valid, false otherwise
* @throws Exception
*/
public boolean validateHtml( String htmlContent ) throws Exception {
InputStream in = new ByteArrayInputStream( htmlContent.getBytes( "UTF-8" ));
ByteArrayOutputStream out = new ByteArrayOutputStream();
SourceCode sourceCode = new SourceCode();
ImageCollector imageCollector = new ImageCollector(sourceCode);
boolean showSource = false;
MessageEmitter emitter = new TextMessageEmitter( out, false );
MessageEmitterAdapter errorHandler = new MessageEmitterAdapter( sourceCode, showSource, imageCollector, 0, false, emitter );
errorHandler.setErrorsOnly( true );
SimpleDocumentValidator validator = new SimpleDocumentValidator();
validator.setUpMainSchema( "http://s.validator.nu/html5-rdfalite.rnc", new SystemErrErrorHandler());
validator.setUpValidatorAndParsers( errorHandler, true, false );
validator.checkHtmlInputSource( new InputSource( in ));
return 0 == errorHandler.getErrors();
}
@orbatschow

This comment has been minimized.

Copy link

@orbatschow orbatschow commented Apr 18, 2015

How can i collect all errors (maybe even with my preferential format like JSON) ?
And what is this line for ?

validator.setUpMainSchema( "http://s.validator.nu/html5-rdfalite.rnc", new SystemErrErrorHandler());

@raboof

This comment has been minimized.

Copy link

@raboof raboof commented Nov 19, 2016

The errors flow from SimpleDocumentValidator into the MessageEmitterAdapter into the TextMessageEmitter into the ByteArrayOutputStream.

To actually see them you'll have to call errorHandler.end(...) before reading out.

I agree a nicer way to 'programmatically' collect the errors would be great, but I didn't see anything particulary nice yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.