Skip to content

Instantly share code, notes, and snippets.

@aknishiumi
Created February 1, 2012 03:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aknishiumi/1714866 to your computer and use it in GitHub Desktop.
Save aknishiumi/1714866 to your computer and use it in GitHub Desktop.
cybernekoを使ってなるべく原型を留めてHTMLをDOMにパースするときのメモ ref: http://qiita.com/items/1962
// DOMパーサーはxercesを使う
DOMParser parser = new DOMParser(new HTMLConfiguration());
parser.setFeature("http://cyberneko.org/html/features/balance-tags", false);
parser.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment", true); parser.setFeature("http://cyberneko.org/html/features/balance-tags/ignore-outside-content", true);parser.setFeature("http://cyberneko.org/html/features/scanner/notify-builtin-refs", true);
parser.setProperty("http://cyberneko.org/html/properties/names/elems", "match");
parser.setProperty("http://cyberneko.org/html/properties/names/attrs", "no-change");
parser.setProperty("http://cyberneko.org/html/properties/default-encoding", "UTF-8");
parser.parse(new InputSource(new StringReader(layout)));
Document doc = parser.getDocument();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment