Last active
April 20, 2016 14:12
-
-
Save ryoakg/67c17cd5e57bb5376f35 to your computer and use it in GitHub Desktop.
Tikaでrtf読む.文字の色とか大きさは取れなかった.残念
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;;; `boot repl` to go | |
(set-env! :dependencies '[[org.apache.tika/tika-parsers "1.10"]]) | |
(import '(org.apache.tika metadata.Metadata | |
parser.ParseContext | |
parser.rtf.RTFParser) | |
'(java.io StringWriter FileInputStream) | |
'(javax.xml.transform sax.SAXTransformerFactory | |
stream.StreamResult | |
OutputKeys)) | |
(let [metadata (Metadata.) | |
sw (StringWriter.) | |
factory (cast SAXTransformerFactory (SAXTransformerFactory/newInstance)) | |
handler (doto (.newTransformerHandler factory) | |
(.. getTransformer (setOutputProperty OutputKeys/METHOD "xml")) | |
(.. getTransformer (setOutputProperty OutputKeys/INDENT "no")) | |
(.setResult (StreamResult. sw)))] | |
(. (RTFParser. ) parse (FileInputStream. "1.rtf") handler metadata (ParseContext.)) | |
(-> sw .toString prn) | |
) |
太字のところは、bタグ入れてくれてた
xml も html も
http://docs.oracle.com/javase/7/docs/api/javax/xml/transform/OutputKeys.html
で、出力を設定できるみたい
色とか取るのは無理っぽそう
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
http://stackoverflow.com/questions/10170570/problems-parsing-a-table-inside-an-rtf-file-using-apache-tika
の通りにやってみた
via http://m12i.hatenablog.com/entry/20120814/1344874785