Skip to content

Instantly share code, notes, and snippets.

@bitsgalore
Last active August 8, 2019 14:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bitsgalore/300a295572606c17fa763335a255efaf to your computer and use it in GitHub Desktop.
Save bitsgalore/300a295572606c17fa763335a255efaf to your computer and use it in GitHub Desktop.

Proposed jpylyzer output format changes

Johan van der Knijff, 3 July 2019

This document describes some proposed changes to the jpylyzer output format for the upcoming jpylyzer 2.0 release (which is foreseen for November 2019). The main occasion for these changes is the addition of raw codestream validation functionality. Since this functionality will lead to a small (but nevertheless breaking) change to jpylyzer's output format, this is a good moment for fixing a few other inconsistencies.

Related Github issues are:

The modifications as described below have already been implemented in the testcodestream development branch of jpylyzer.

Current (jpylyzer 1.x) output format

Output for 1 single file:

<?xml version='1.0' encoding='UTF-8'?>
<jpylyzer xmlns="http://openpreservation.org/ns/jpylyzer/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://openpreservation.org/ns/jpylyzer/ http://jpylyzer.openpreservation.org/jpylyzer-v-1-1.xsd">
    <toolInfo>
        <toolName>jpylyzer</toolName>
        <toolVersion>1.18.0</toolVersion>
    </toolInfo>
    <fileInfo>
        <fileName>aware.jp2</fileName>
        <filePath>/home/johan/jpylyzer-test-files/aware.jp2</filePath>
        <fileSizeInBytes>662735</fileSizeInBytes>
        <fileLastModified>Wed Dec  2 08:28:52 2015</fileLastModified>
    </fileInfo>
    <statusInfo>
        <success>True</success>
    </statusInfo>
    <isValidJP2>True</isValidJP2>
    <tests/>
    <properties>
        ::
    </properties>
</jpylyzer>

Output for 2 files with --wrapper option enabled (this wraps multiple jpylyzer elements inside a results element, which is the root element in this case):

<?xml version='1.0' encoding='UTF-8'?>
<results xmlns="http://openpreservation.org/ns/jpylyzer/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://openpreservation.org/ns/jpylyzer/ http://jpylyzer.openpreservation.org/jpylyzer-v-1-1.xsd">
<jpylyzer>
    <toolInfo>
        <toolName>jpylyzer</toolName>
        <toolVersion>1.18.0</toolVersion>
    </toolInfo>
    <fileInfo>
        <fileName>aware.jp2</fileName>
        <filePath>/home/johan/jpylyzer-test-files/aware.jp2</filePath>
        <fileSizeInBytes>662735</fileSizeInBytes>
        <fileLastModified>Wed Dec  2 08:28:52 2015</fileLastModified>
    </fileInfo>
    <statusInfo>
        <success>True</success>
    </statusInfo>
    <isValidJP2>True</isValidJP2>
    <tests/>
    <properties>
        ::
    </properties>
</jpylyzer>
<jpylyzer>
    <toolInfo>
        <toolName>jpylyzer</toolName>
        <toolVersion>1.18.0</toolVersion>
    </toolInfo>
    <fileInfo>
        <fileName>rubbish.jp2</fileName>
        <filePath>/home/johan/jpylyzer-test-files/rubbish.jp2</filePath>
        <fileSizeInBytes>662735</fileSizeInBytes>
        <fileLastModified>Wed Dec  5 09:28:52 2015</fileLastModified>
    </fileInfo>
    <statusInfo>
        <success>True</success>
    </statusInfo>
    <isValidJP2>True</isValidJP2>
    <tests/>
    <properties>
        ::
    </properties>
</jpylyzer>
</results>

Problems with the current format

  • The name of the isValidJP2 element would not really be appropriate for raw codestream validation (because in this case jpylyzer only validates against the codestream specification, not against the JP2 specification!
  • The use of the results element if --wrapper or --recurse are activated is confusing, because it results in slightly different variations of the output format (it's also a bit ugly).
  • If --wrapper is not used in case of multiple files, jpylyzer's output is not even well-formed XML!
  • The information inside toolInfo is repeated for each file.

Proposal for jpylyzer 2.0 format

  • The root element is always jpylyzer
  • The jpylyzer element contains 1 toolInfo element and 1 or more file elements
  • Each file element contains the output for one individual file/image. Inside it are the usual sub-elements (fileInfo, statusInfo, etc.)
  • The isValidJP2 element is replaced by the new isValid element. A format attribute defines the validation format (allowed values: jp2 for JP2 validation, and j2c for raw codestream validation). The validation format is defined by the new --format command-line option (if this option is not set, jpylyzer validates against JP2 by default).

The Figure below gives an overview of the revised format:

Examples

Output for 1 single JP2 using JP2 validation (complete output available here):

<?xml version='1.0' encoding='UTF-8'?>
<jpylyzer xmlns="http://openpreservation.org/ns/jpylyzer/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://openpreservation.org/ns/jpylyzer/ http://jpylyzer.openpreservation.org/jpylyzer-v-2-0.xsd">
<toolInfo>
    <toolName>jpylyzer</toolName>
    <toolVersion>2.0.0a1</toolVersion>
</toolInfo>
<file>
    <fileInfo>
        <fileName>aware.jp2</fileName>
        <filePath>/home/johan/jpylyzer-test-files/aware.jp2</filePath>
        <fileSizeInBytes>662735</fileSizeInBytes>
        <fileLastModified>Wed Dec  2 08:28:52 2015</fileLastModified>
    </fileInfo>
    <statusInfo>
        <success>True</success>
    </statusInfo>
    <isValid format="jp2">True</isValid>
    <tests/>
    <properties>
        ::
    </properties>
</file>
</jpylyzer>

Output for 1 single codestream using codestream validation (complete output available here):

<?xml version='1.0' encoding='UTF-8'?>
<jpylyzer xmlns="http://openpreservation.org/ns/jpylyzer/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://openpreservation.org/ns/jpylyzer/ http://jpylyzer.openpreservation.org/jpylyzer-v-2-0.xsd">
<toolInfo>
    <toolName>jpylyzer</toolName>
    <toolVersion>2.0.0a1</toolVersion>
</toolInfo>
<file>
    <fileInfo>
        <fileName>is_codestream.jp2</fileName>
        <filePath>/home/johan/jpylyzer-test-files/is_codestream.j2c</filePath>
        <fileSizeInBytes>628385</fileSizeInBytes>
        <fileLastModified>Wed Dec  2 08:28:52 2015</fileLastModified>
    </fileInfo>
    <statusInfo>
        <success>True</success>
    </statusInfo>
    <isValid format="j2c">True</isValid>
    <tests/>
    <properties>
        ::
    </properties>
</file>
</jpylyzer>

Output for multiple files (complete output available here):

<?xml version='1.0' encoding='UTF-8'?>
<jpylyzer xmlns="http://openpreservation.org/ns/jpylyzer/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://openpreservation.org/ns/jpylyzer/ http://jpylyzer.openpreservation.org/jpylyzer-v-2-0.xsd">
<toolInfo>
    <toolName>jpylyzer</toolName>
    <toolVersion>2.0.0a1</toolVersion>
</toolInfo>
<file>
    <fileInfo>
        <fileName>openJPEG15.jp2</fileName>
        <filePath>/home/johan/test/openJPEG15.jp2</filePath>
        <fileSizeInBytes>670372</fileSizeInBytes>
        <fileLastModified>Wed Dec  2 08:28:52 2015</fileLastModified>
    </fileInfo>
    <statusInfo>
        <success>True</success>
    </statusInfo>
    <isValid format="jp2">True</isValid>
    <tests/>
    <properties>
        ::
    </properties>
</file>
<file>
    <fileInfo>
        <fileName>palettedImage.jp2</fileName>
        <filePath>/home/johan/test/palettedImage.jp2</filePath>
        <fileSizeInBytes>317550</fileSizeInBytes>
        <fileLastModified>Wed Dec  2 08:28:52 2015</fileLastModified>
    </fileInfo>
    <statusInfo>
        <success>True</success>
    </statusInfo>
    <isValid format="jp2">True</isValid>
    <tests/>
    <properties>
        ::
    </properties>
</file>
</jpylyzer>

Backward compatibility

Since these changes will break existing workflows, jpylyzer 2 will have a new --legacyout option. When it is activated, output is reported in jpylyzer 1.x format. Codestream validation cannot be used if --legacyout is activated.

Implications for --wrapper option

As jpylyzer 2 wraps the output of all analysed files into well-formed XML, the --wrapper option will be ignored by default. The option will remain available for use with the --legacyout option. However, it will be marked as deprecated in the documentation and helper text.

@tledoux
Copy link

tledoux commented Jul 23, 2019

Hi Johan, some comments about your proposal.

1/ you should put a version in your namespace if it's not backward compatible
2/ as proposed in my PR, the fileLastModified should be in xs:dateTime format (otherwise it depends on the current locale...)
3/ it should be nice to have a propertiesExtension tag (in the same manner as PREMIS, and of type extensionComplexType) in order to enable the output of other information, like the MIX addition I proposed in my second PR.

Thanks for making this tool so complete and useful.

@bitsgalore
Copy link
Author

bitsgalore commented Aug 7, 2019

Hi Thomas,

Thanks for your comments, this is really useful.

1. Namespace version

From what I understand you propose something like this:

<jpylyzer xmlns="http://openpreservation.org/ns/jpylyzer/v2/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://openpreservation.org/ns/jpylyzer/v2/ http://jpylyzer.openpreservation.org/jpylyzer-v-2-0.xsd">

And then in the XSD schema:

<xs:schema 
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://openpreservation.org/ns/jpylyzer/v2/"
  xmlns="http://openpreservation.org/ns/jpylyzer/v2/"
  elementFormDefault="qualified">

Is that correct? (Looking here I see someone warns against including minor versions in namespaces; in any case the number of possibilities available for dealing with versions is pretty dazzling, esp. looking at this).

2. fileLastModified

Yes this is a good change, I already merged your PR into master.

3. propertiesExtension

See my comments to your PR here.

@tledoux
Copy link

tledoux commented Aug 7, 2019

Hi Johan

indeed the namespace versioning is only needed for major version (incompatible changes), so your implementation perfectly answers this need.

Concerning my last proposition, the idea is to make the schema more lax by enabling extensions for future uses. For example, this is made systematic in the premis schema and is very useful (for example, you can add information to a jpylyzer output and still have a valid xml).

An exemple from premis is

<xs:element name="agentExtension" type="extensionComplexType"/>

<!--  
**************** extension definition
-->
<xs:complexType name="extensionComplexType">
  <xs:sequence>
    <xs:any namespace="##any" processContents="lax" maxOccurs="unbounded"/>
  </xs:sequence>
</xs:complexType>

This addition is independent of the fact that the mix output is implemented or not, since many other uses can be overseen.

@bitsgalore
Copy link
Author

For info, a jpylyzer branch that integrates the modifications by tledoux into the v2 code can now be found here. It also includes an updated version of the XSD schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment