Skip to content

Instantly share code, notes, and snippets.

@tenderlove
Created April 21, 2009 21:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tenderlove/99401 to your computer and use it in GitHub Desktop.
Save tenderlove/99401 to your computer and use it in GitHub Desktop.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
<body>

<p>
hello world
</p>
</body>
</html>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <libxml/tree.h>
#include <libxml/parser.h>
#include <libxml/xpath.h>
#include <libxml/xpathInternals.h>
#include <libxml/HTMLparser.h>
int main(int argc, char* argv[]) {
printf("parsing %s\n", argv[1]);
htmlDocPtr doc = htmlReadFile((const char *)argv[1], NULL,
HTML_PARSE_RECOVER
);
xmlXPathContextPtr xpathCtx = xmlXPathNewContext(doc);
xmlXPathObjectPtr xpathObj = xmlXPathEvalExpression("//p", xpathCtx);
printf("found %d nodes\n", xpathObj->nodesetval->nodeNr);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment