Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A happier, groovier way to parse RTF: apache_tika + XmlSlurper
package org.akhikhl.test
import org.apache.tika.metadata.Metadata
import org.apache.tika.parser.rtf.RTFParser
class ParseRtf {
def parse(String rtfText) {
// not validating, not ns-aware
XmlSlurper slurper = new XmlSlurper(false, false)
InputStream rtfStream = new ByteArrayInputStream(value.getBytes())
new RTFParser().parse(rtfStream, slurper, new Metadata())
slurper.document.'body'.p.each { p ->
println "Got paragraph: ${p.text()}"
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment