Skip to content

Instantly share code, notes, and snippets.

@mcenirm
Created May 22, 2012 20:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mcenirm/2771534 to your computer and use it in GitHub Desktop.
Save mcenirm/2771534 to your computer and use it in GitHub Desktop.
Convert HTML to XML using TagSoup
XMLReader tagSoupReader = new org.ccil.cowan.tagsoup.Parser();
Transformer identityTransformer = TransformerFactory.newInstance().newTransformer();
Reader sourceReader = new FileReader(sourceFile);
InputSource sourceInputSource = new InputSource(sourceReader);
Source xmlSource = new SAXSource(tagSoupReader, sourceInputSource);
Result outputTarget = new StreamResult(outputFile);
identityTransformer.transform(xmlSource, outputTarget);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment