Quick evaluation on 126 pdfs, with some preprocessing.
- reading is very fast
- writing with BinaryCasWriter and no compression is much slower than BinaryCasWriter with compression, where is the catch?!?
- Maybe this is the catch "Be advised that binary serialization may have drawbacks compared to XMI serialization."
- BinaryCasWriter with compression yields very decent file size
114283ms 59.44% Process ch.epfl.bbp.uima.pdf.cr.PdfCollectionReader
4505ms 2.34% Analysis ch.epfl.bbp.uima.ae.TokenAnnotator
51355ms 26.71% Analysis ch.epfl.bbp.uima.ae.PosTagAnnotator
21752ms 11.31% Analysis ch.epfl.bbp.uima.ae.BlueBioLemmatizer
192263ms TOTAL TIME (3 minutes and 12 seconds)
112566ms 14.23% Process ch.epfl.bbp.uima.pdf.cr.PdfCollectionReader
4922ms 0.62% Analysis ch.epfl.bbp.uima.ae.TokenAnnotator
50227ms 6.35% Analysis ch.epfl.bbp.uima.ae.PosTagAnnotator
22071ms 2.79% Analysis ch.epfl.bbp.uima.ae.BlueBioLemmatizer
601061ms 75.96% Analysis de.tudarmstadt.ukp.dkpro.core.io.bincas.BinaryCasWriter
791257ms TOTAL TIME (13 minutes and 11 seconds)
110638ms 54.83% Process ch.epfl.bbp.uima.pdf.cr.PdfCollectionReader
5394ms 2.67% Analysis ch.epfl.bbp.uima.ae.TokenAnnotator
52900ms 26.22% Analysis ch.epfl.bbp.uima.ae.PosTagAnnotator
22734ms 11.27% Analysis ch.epfl.bbp.uima.ae.BlueBioLemmatizer
9670ms 4.79% Analysis de.tudarmstadt.ukp.dkpro.core.io.bincas.BinaryCasWriter
201777ms TOTAL TIME (3 minutes and 21 seconds)
3600ms 83.92% Process de.tudarmstadt.ukp.dkpro.core.io.bincas.BinaryCasReader
660ms 15.38% Analysis ch.epfl.bbp.uima.ae.AddHeaderFromDkproMetadata
24ms 0.56% Analysis ch.epfl.bbp.uima.ae.StatsAnnotatorPlus
4290ms TOTAL TIME (4 seconds)
2407ms 77.15% Process de.tudarmstadt.ukp.dkpro.core.io.bincas.BinaryCasReader
663ms 21.25% Analysis ch.epfl.bbp.uima.ae.AddHeaderFromDkproMetadata
41ms 1.31% Analysis ch.epfl.bbp.uima.ae.StatsAnnotatorPlus
3120ms TOTAL TIME (3 seconds)
126 pdfs, choosen at random
pdfs: 391 MB
serialized: 358 MB
serialized compressed: 17 MB