When converting content in an archive it is useful for diagnostic purposes to record the versions of major software components used and important conversion options. Another common use case is to identify records that later need to be reconverted with newer software in order to improve conversion quality or fix records misconverted due to a bug or incorrect option.
The WARC-Conversion-Software field indicates the version of software components used in the conversion of the record's content. The field value has the same format as a HTTP User-Agent field (see RFC7231 section 5.5.3) and consists of a list of one or more product identifiers and zero or more comments.
WARC-Conversion-Software = product *( RWS ( product / comment ) )
product = token [ "/" product-version ]
product-version = token
comment = "(" *( ctext / quoted-pair / comment ) ")"
For example:
WARC-Conversion-Software: ImageMagick/6.9.9-38 (x86 linux)
Multiple product identifiers may be used to indicate the version of important subcomponents such as codec libraries used when encoding a video.
WARC-Conversion-Software: ffmpeg/4.0.3 libvpx/1.8.0 libopus/1.3
When product identifiers represent multiple steps in a processing pipeline they
should be listed in processing order and otherwise in decreasing order of
significance for identifying the software. For example a TIFF image decoded with
an unknown version of libtiff
and then re-encoded with libjpeg
version 9c
could be recorded as:
WARC-Conversion-Software: libtiff libjpeg/9c
Software components unimportant to the conversion process, such as other codecs that a video transcoder happens to support but did not use, should not be listed.
The WARC-Conversion-Software field may be used in ‘conversion’ type records and shall not be used for other record types.
The WARC-Conversion-Options
field indicates the options used when converting the content.
The format of the field value is specific to the conversion software used.
WARC-Conversion-Software = *TEXT
Some examples:
WARC-Conversion-Options: acodec=mp3 bitrate=64
WARC-Conversion-Options: {"lossless": true, layers: 5}
By convention when the conversion software is configured through command-line options a full
command-line should be included with the tokens {input}
and {output}
representing the input and
output file respectively.
WARC-Conversion-Options: ffmpeg -y -i {input} -c:v vp9 -c:a libopus -speed 4 {output}
A conversion involving multiple steps may be indicated using a shell pipeline
WARC-Conversion-Options: bzip2 -d | gzip -9
or multiple sequential commands separated by semi-colons:
WARC-Conversion-Options: ddjvu -format=tiff {input} tmp.tif; convert tmp.tif tmp.png;
pngcrush tmp.png {output}
If the conversion options are not representable in a short text form suitable for including in a header field they may be recorded separately in one or more ‘metadata’ records. In such cases the WARC-Conversion-Options field may still include a short textual summary of only the most important options for diagnostic purposes.
The WARC-Conversion-Options field may be used in ‘conversion’ type records and shall not be used for other record types.