Skip to content

Instantly share code, notes, and snippets.

@vincent-zurczak
Last active September 26, 2021 18:50
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vincent-zurczak/23e0f626eaafab96cb32 to your computer and use it in GitHub Desktop.
Save vincent-zurczak/23e0f626eaafab96cb32 to your computer and use it in GitHub Desktop.
HTML 5 validation in Java (based on the Nu HTML Checker)
<!-- Add this in your POM -->
<dependency>
<groupId>nu.validator</groupId>
<artifactId>validator</artifactId>
<version>15.3.14</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
/**
* Verifies that a HTML content is valid.
* @param htmlContent the HTML content
* @return true if it is valid, false otherwise
* @throws Exception
*/
public boolean validateHtml( String htmlContent ) throws Exception {
InputStream in = new ByteArrayInputStream( htmlContent.getBytes( "UTF-8" ));
ByteArrayOutputStream out = new ByteArrayOutputStream();
SourceCode sourceCode = new SourceCode();
ImageCollector imageCollector = new ImageCollector(sourceCode);
boolean showSource = false;
MessageEmitter emitter = new TextMessageEmitter( out, false );
MessageEmitterAdapter errorHandler = new MessageEmitterAdapter( sourceCode, showSource, imageCollector, 0, false, emitter );
errorHandler.setErrorsOnly( true );
SimpleDocumentValidator validator = new SimpleDocumentValidator();
validator.setUpMainSchema( "http://s.validator.nu/html5-rdfalite.rnc", new SystemErrErrorHandler());
validator.setUpValidatorAndParsers( errorHandler, true, false );
validator.checkHtmlInputSource( new InputSource( in ));
return 0 == errorHandler.getErrors();
}
@orbatschow
Copy link

How can i collect all errors (maybe even with my preferential format like JSON) ?
And what is this line for ?

validator.setUpMainSchema( "http://s.validator.nu/html5-rdfalite.rnc", new SystemErrErrorHandler());

@raboof
Copy link

raboof commented Nov 19, 2016

The errors flow from SimpleDocumentValidator into the MessageEmitterAdapter into the TextMessageEmitter into the ByteArrayOutputStream.

To actually see them you'll have to call errorHandler.end(...) before reading out.

I agree a nicer way to 'programmatically' collect the errors would be great, but I didn't see anything particulary nice yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment