Skip to content

Instantly share code, notes, and snippets.

@krisleech
Created October 31, 2011 22:40
Show Gist options
  • Save krisleech/1329264 to your computer and use it in GitHub Desktop.
Save krisleech/1329264 to your computer and use it in GitHub Desktop.
Using JSoup in JRuby
require 'java'
require 'stringio'
include Java
mydir = File.expand_path(File.dirname(__FILE__))
require File.join(mydir, 'jsoup-1.6.1.jar')
import "org.jsoup.Jsoup"
import "org.jsoup.safety.Whitelist"
header = <<-HEADER
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
HEADER
footer = <<-FOOTER
</body>
</html>
FOOTER
inner = "<table id=table1 cellspacing=2px <h1>CONTENT</h1> <td><a href=index.html>1 -> Home Page</a> <td><a href=intro.html>2 -> Introduction</a> "
html = header + inner + footer
html = '<p>dsfsdfsdf</p></strong> </strong></strong> <a href="http://google.com/dsf.html">dsfsdF</a> <br>'
whitelist = org.jsoup.safety.Whitelist.relaxed
puts org.jsoup.Jsoup.clean(html, whitelist)
@krisleech
Copy link
Author

I'm using this instead of Nokogiri because of JRuby specific bug which means the output is not xhtml valid.
https://github.com/tenderlove/nokogiri/issues/557

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment