Skip to content

Instantly share code, notes, and snippets.

@takahashim
Last active August 29, 2015 14:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save takahashim/4fcdfb3f9acbe758d91e to your computer and use it in GitHub Desktop.
Save takahashim/4fcdfb3f9acbe758d91e to your computer and use it in GitHub Desktop.

XML output

The new -out argument can be used to output a xml file containing some information extracted from the input EPUB file.

Calling java -jar epubcheck-3.0.1.jar -out output.xml file.epub will generate the file output.xml containing information on the file.epub.

The output file uses the jhove schema (available at http://hul.harvard.edu/ois/xml/xsd/jhove/jhove.xsd or see the project http://sourceforge.net/projects/jhove/) in order to display the information so that properties of any type can be output.

Here is an example of XML output (for src/test/resources/30/epub/invalid/invalid-ncx.epub with epubcheck-3.0.1):

<?xml version="1.0" encoding="UTF-8"?>
<jhove xmlns="http://hul.harvard.edu/ois/xml/ns/jhove" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" name="epubcheck" release="3.0.1" date="2013-05-10">
 <date>2014-07-02T01:48:43+09:00</date>
 <repInfo uri="invalid-ncx.epub">
  <created>2011-09-01T17:16:54Z</created>
  <lastModified>2011-09-01T17:18:00Z</lastModified>
  <format>application/epub+zip</format>
  <version>3.0</version>
  <status>Not well-formed</status>
  <messages>
   <message>ERROR: /EPUB/lorem.ncx(20): 'ch1a': fragment identifier is not defined in 'EPUB/lorem.xhtml'</message>
   <message>ERROR: /EPUB/lorem.ncx(26): 'ch2b': fragment identifier is not defined in 'EPUB/lorem.xhtml'</message>
  </messages>
  <mimeType>application/epub+zip</mimeType>
  <properties>
   <property><name>CharacterCount</name><values arity="Scalar" type="Long"><value>6165</value></values></property>
   <property><name>Language</name><values arity="Scalar" type="String"><value>la</value></values></property>
   <property><name>Info</name><values arity="List" type="Property">
    <property><name>Identifier</name><values arity="Scalar" type="String"><value>urn:uuid:550e8400-e29b-41d4-a716-4466674412314</value></values></property>
    <property><name>CreationDate</name><values arity="Scalar" type="Date"><value>2011-09-01T17:16:54Z</value></values></property>
    <property><name>ModDate</name><values arity="Scalar" type="Date"><value>2011-09-01T17:18:00Z</value></values></property>
    <property><name>Title</name><values arity="Scalar" type="String"><value>Lorem Ipsum</value></values></property>
    <property><name>Date</name><values arity="Scalar" type="String"><value>2011-09-01</value></values></property>
   </values></property>
  </properties>
 </repInfo>
</jhove>

If you need another schema to work with, you can create a XSL stylesheet in order to transform the given output. If you prefer to directly output another kind of information, you must use the second method explain just below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment