Skip to content

Instantly share code, notes, and snippets.

Created July 17, 2013 12:57
Show Gist options
  • Save melix/6020336 to your computer and use it in GitHub Desktop.
Save melix/6020336 to your computer and use it in GitHub Desktop.
Convert Confluence HTML export into asciidoc
import org.htmlcleaner.*
def src = new File('html').toPath()
def dst = new File('asciidoc').toPath()
def cleaner = new HtmlCleaner()
def props =
props.translateSpecialEntities = false
def serializer = new SimpleHtmlSerializer(props)
src.toFile().eachFileRecurse { f ->
def relative = src.relativize(f.toPath())
def target = dst.resolve(relative)
if (f.isDirectory()) {
} else if ('.html')) {
def tmpHtml = File.createTempFile('clean', 'html')
println "Converting $relative"
def result = cleaner.clean(f)
result.traverse({ tagNode, htmlNode ->
tagNode?.attributes?.remove 'class'
if ('td' == tagNode?.name || 'th'==tagNode?.name) {'td'
String txt = tagNode.text
tagNode.insertChild(0, new ContentNode(txt))
} as TagNodeVisitor)
result, tmpHtml.absolutePath, "utf-8"
"pandoc -f html -t asciidoc -R -S --normalize -s $tmpHtml -o ${target}.adoc".execute().waitFor()
}/* else {
"cp html/$relative $target".execute()
Copy link

thugcee commented Mar 26, 2024

Is the line 22, tagNode?.attributes?.remove 'class', a good idea? For me it breaks conversion of code blocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment