Skip to content

Instantly share code, notes, and snippets.

@welblaud
Last active October 14, 2022 06:18
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save welblaud/869b9ddc62cd3c8a55cdd4b0876ea43a to your computer and use it in GitHub Desktop.
Save welblaud/869b9ddc62cd3c8a55cdd4b0876ea43a to your computer and use it in GitHub Desktop.
A module for preparing TEI Simple XML files stored in eXist-db for latter usage in InDesign
xquery version "3.0";
module namespace dtp-utils = 'http://46.28.111.241:8081/exist/db/apps/karolinum-x/modules/dtp-utils';
import module namespace cust-utils = 'http://46.28.111.241:8081/exist/db/apps/karolinum-x/modules/cust-utils' at 'cust-utils.xqm';
declare namespace tei = 'http://www.tei-c.org/ns/1.0';
(:~ This module is useful for in-memory converting of TEI Simpe XML into XML suitable
: for importing into InDesign. It is not a silver bullet, it was tested and developed
: for a very specific scenario. However, it should be useful for anyone who uses TEI
: Simple XML (possibly with minor modifications of replacing strings and so on) and
: wants typeset the data in InDesign. It treats a couple of well known obstacles,
: which prevents many people from importing XML into InDesign.
:
: The basic idea is:
: 1) Grab the file/book;
: 2) Make some minor textual changes, which are not useful for files stored
: in the DB (here specifically for Czech prepositions).
: 3) For graphics, replace @url with @href attribute, add file:///img for
: relative paths. It depends on whre you want to store images in
: the archive/folder.
: 4) For every title (head element), add a number of its level in the
: document hierarchy, it is later much easier to style headings automatically.
: 5) It is not easy to import notes as footnotes in InDesign; the workaround
: is to wrap every note into a chosen pattern, which is later easily
: recognized by a special script I provide in a form of Gist too (url below).
: Because it is better to wrap everything into an element, I wrap notes
: into <footnoteForInDesign/>. And because of every note includes one or
: more paragraphs, which I need to somehow differ from the rest of paragraphs
: outside the note (for automatic styling), I rename these to pnote.
: 6) InDesign also does not support tables as they are in TEI XML. However,
: it supports CALS standard for tables. Hence I convert every table into
: this standard.
: 7) I separated all the mentioned functionality into a bunch of functions.
: It seems more modular!
: 8) At the end, I pack the file aside with all images. If there is some
: image missing in the DB or the path is in some way wrong, it
: adds a text file as a log about the problem istead of the image.
: 9) In InDesign (tested on version CS6):
: i) create new file;
: ii) import the XML;
: iii) in the import dialog, check mainly the last option about
: importing CALS tables (if you need that), the rest is up to you;
: iv) do not be in panic, the imported file will certainly include a lot
: of annoying whitespace. It is handy to remove it with GREP–
: the regex working for me is (\r\s{2,})|(\s{2,}\r) and replace
: with \r, other ways are much more complicated;
: v) if you did not prepare the styling and tag–style associations
: earlier, do it now—map TAGS to STYLES (I don’t have much experience
: with the opposite action of mapping styles to tags);
: vi) now you will see that your notes are still present in the text—feel
: free to use the script provided on the link below (Windows–Helpers–Scripts,
: User folder, right click and Open it in Explorer or Finder, put the script.js
: into it, close the Finder/Explorer window, run the script from the window/panel,
: where is should be present), it will convert all notes wrapped into @foot_beg@
: and @foot_end@ in real footnotes;
: vii) if there is a linebreak between the index numbers and bodies of
: footnotes, remove it with GREP (\t\r replace with \t or \s, as desired)
: viii) if you have problems with some pictures (they are “hidden” and overflow
: at the end of the document or some of its parts, it is because there is
: missing the metadata about their DPI value), repair them in Photoshop
: or Gimp or so (simply add the DPI value), it is good idea to repair
: them before the import, I don’t have any experiences with doing that later. ~:)
(: Prepare Footnotes – takes every instance of tei:note and wraps it into
: <footnoteForInDesign>@foot_beg@ … @foot_end@</footnoteForInDesign>,
: it is needed for a special script in InDesign, which moves notes places
: originally in text into the area for footnotes, below the page.
: The script: https://gist.github.com/welblaud/c21a96f2f23db58b4011726cf21addb8
: It is also very handy to rename the paragraphs to pnote or other custom
: name which differentiates them from the paragraphs outisde the note,
: styling and style assigning is much easier with this. :)
declare function dtp-utils:prepare-footnotes($document as node()*) as item()* {
for $node in $document
return
typeswitch ($node)
(: returns the whole document-node :)
case document-node() return
dtp-utils:prepare-footnotes($node/node())
case element() return
(: if the element is note, wrap it into footnoteForInDesign
element and @foot_beg@/@foot_end@ strings :)
if (xs:string(name($node)) eq 'note') then
element { 'footnoteForInDesign' } {
'@foot_beg@',
element { name($node) } {
$node/@*,
dtp-utils:prepare-footnotes($node/node())
},
'@foot_end@'
}
(: if the element is a paragraph inside the note [simple:footnote:text],
rename it to pnote :)
else if ($node/@rendition eq 'simple:footnote:text') then
element { 'pnote' } {
$node/@*,
dtp-utils:prepare-footnotes($node/node())
}
(: the rest of elements are passed through :)
else
element { node-name($node) } {
$node/@*,
dtp-utils:prepare-footnotes($node/node())
}
(: text nodes are passed through too :)
case text() return
$node
(: the rest are ommited – processing instructions, comments :)
default return
()
};
(: Prepare Heads takes every head (except those in figures) and renames
: it according to its level. It is useful for applying/mapping styles in InDesign. :)
declare function dtp-utils:prepare-heads($document as node()*) as item()* {
for $node in $document
return
typeswitch ($node)
(: return the whole document-node :)
case document-node() return
dtp-utils:prepare-heads($node/node())
(: return the node but rename it according to the level, omit heads in figures and tables :)
case element() return
if (xs:string(name($node)) eq 'head' and not($node/parent::tei:figure) and not($node/parent::tei:table)) then
element { name($node) || count($node/ancestor::tei:div) } {
dtp-utils:prepare-heads($node/node())
}
else
element { node-name($node) } {
$node/@*,
dtp-utils:prepare-heads($node/node())
}
case text() return
$node
default return
()
};
(: Prepare Images takes every graphic and replaces the name of its @url
: with @href, which is preferred in InDesign. It also replaces the
: contents of the graphic with the url. This solution is less
: error prone. Typesetter only sees the url and places
: the image manually. :)
declare function dtp-utils:prepare-images($document as node()*) as item()* {
for $node in $document
return
typeswitch ($node)
(: returns the whole document-node :)
case document-node() return
dtp-utils:prepare-images($node/node())
case element() return
(: if the element is graphic, replaces the url attribute :)
if (xs:string(name($node)) eq 'graphic') then
element { name($node) } {
attribute { 'href' } { 'file:///img/' || $node/@url },
dtp-utils:prepare-images($node/node())
}
else
element { node-name($node) } {
$node/@*,
dtp-utils:prepare-images($node/node())
}
case text() return
$node
default return
()
};
(: Prepare Tables transforms any table into CALS standard, which is supported
: by InDesign natively. Tables are then imported automatically as tables!
: Because of the possibility the function could be used at some point where
: the elements have lost their assotiation with the default namespace, it
: is useful to iterate over both, table elements IN and OUT of the
: TEI namespace. :)
declare function dtp-utils:prepare-tables($document as node()*) as item()* {
for $node in $document
return
typeswitch ($node)
case document-node() return
(: returns the document-node :)
dtp-utils:prepare-tables($node/node())
case element() return
(: if the node is table, returns the element table :)
if (xs:string(name($node)) eq 'table') then
element { name($node) } {
(: changes head to title at the top of the tgroup :)
element { 'title' } {
data($node/tei:head)
},
(: wraps thead and tbody into tgroup with approriate cols number (estimated from label row) :)
element { 'tgroup' } {
attribute { 'cols' } { count($node/tei:row[1]/tei:cell) },
for $cell at $count in ($node/tei:row[1]/tei:cell)
return
(: return colspec empty element for every column, name it :)
element { 'colspec' } { attribute { 'colname' } { 'coll_' || $count }, () },
(: makes thead from the label row :)
element { 'thead' } {
element { 'row' } {
for $cell in ($node/tei:row[@role='label']/tei:cell)
return
dtp-utils:prepare-tables($cell)
}
},
(: makes tbody :)
element { 'tbody' } {
for $row in ($node/tei:row[not(@role)])
return
element { 'row' } {
for $cell in ($row/tei:cell)
return
dtp-utils:prepare-tables($cell)
}
}
}
}
(: for every cell, returns an entry element and if there are any rows or colls attributes,
returns a morerows attribute or compute and return namest and nameend attributes :)
else if (xs:string(name($node)) eq 'cell' and $node//node()) then
element { 'entry' } {
if ($node/@cols) then
attribute { 'nameend' } { if ($node/@cols) then 'coll_' || $node/position() + $node/@cols else 'coll_' || $node/position() }
else (),
if ($node/@cols) then
attribute { 'namest' } { 'coll_' || $node/position() }
else (),
if ($node/@rows) then
attribute { 'morerows' } { if ($node/@rows) then $node/@rows else '1' }
else (),
dtp-utils:prepare-tables($node/node())
}
(: if the cell is empty, returns nothing – CALS does not allow empty cells :)
else if (xs:string(name($node)) eq 'cell' and not($node//node())) then
()
(: for any other element in the document, returns it as is :)
else
element { node-name($node) } {
$node/@*,
dtp-utils:prepare-tables($node/node())
}
(: if the $node is text(), returns it as is :)
case text() return
$node
(: drops any other things :)
default return
()
};
(: Prepare for InDesign takes a document, sanitizes all Czech prepositions,
: dashes and § characters (puts non-breaking spaces after or before every
: of them), renames every head according to its level (e.g. head4) because
: of the necessity of difference from other heads (styling!), replaces @url
: with @src in all graphics, prepares footnotes for latter usage with
: a special script in InDesign, and converts tables into the CALS standard,
: which is supported by InDesign natively. :)
declare function dtp-utils:prepare-for-indesign($document as node()) as node() {
let $pass1 := cust-utils:sanitize-spaces($docu)
let $pass2 := dtp-utils:prepare-heads($pass1)
let $pass3 := dtp-utils:prepare-images($pass2)
let $pass4 := dtp-utils:prepare-footnotes($pass3)
let $pass5 := dtp-utils:prepare-tables($pass4)
return $pass5
};
(: Pack for DTP – packs necessary files in the same way as the function for packing
: entries for ePub. In the case there are images missing in the DB, it adds
: a text file with the name of the missing image and the link
: to the missing file is added into the body of the file. :)
declare function dtp-utils:pack-for-dtp($document as node(), $doc-uri as xs:string, $name as xs:string) as xs:base64Binary {
let $archiveName as xs:string := $name
let $root as xs:string := replace($doc-uri, '[^/]*?$', '')
let $doc-prepared := dtp-utils:prepare-for-indesign($document)
(: Main Document :)
let $doc := <entry name="files/{$name}.xml" type="xml">{$doc-prepared}</entry>
(: Pics :)
let $pics as item()* :=
(
let $images := $document//tei:graphic
for $fileName in distinct-values($images/@url)
let $res := $root || 'img/hires/' || $fileName
return
if (util:binary-doc-available($res)) then
<entry name='files/img/{$fileName}' type='binary'>{util:binary-doc($res)}</entry>
else
<entry name='files/img/{$fileName}-url-error.txt' type='text'>Chyba v názvu souboru nebo linku: {$res}</entry>
)
let $entries as node()* := ($doc, $pics)
let $zip-file as item() := compression:zip($entries, true())
return
response:stream-binary($zip-file, 'application/zip', lower-case(replace($archiveName, ' ', '-')) || '.zip')
};
<figure xml:id="fig1">
<graphic url="tealover.jpg"/>
<head>Illustrandum tealover</head>
</figure>
<figure xml:id="fig1">
<graphic href="file:///img/tealover.jpg"/>
<head>Illustrandum tealover</head>
</figure>
<note place="bottom" xml:id="ftn1">
<p rendition="simple:footnote:text">
<hi rendition="simple:italic">Veškerý</hi> žoust je v jídle!</p>
</note>
<footnoteForInDesign>
@foot_beg@
<note place="bottom" xml:id="ftn1">
<pnote rendition="simple:footnote:text">
<hi rendition="simple:italic">Veškerý</hi> žoust je v jídle!</pnote>
</note>
@foot_end@
</footnoteForInDesign>
<table rendition="simple:rules">
<head>Tabule k rozčicům</head>
<row role="label">
<cell>půlpik</cell>
<cell>dolot</cell>
<cell>tujta</cell>
<cell>xorosol</cell>
</row>
<row>
<cell cols="2">nikdy</cell>
<cell>bodok
<note place="bottom" xml:id="ftn3">
<p rendition="simple:footnote:text">Ahoj 2</p>
</note>
</cell>
<cell/>
</row>
<row>
<cell rows="2">jednou</cell>
<cell>jutoj</cell>
<cell>bodok
<note place="bottom" xml:id="ftn4">
<p rendition="simple:footnote:text">Ahoj 3</p>
</note>
</cell>
<cell/>
</row>
<row>
<cell>jednou</cell>
<cell>nikdy</cell>
<cell>bodok
<note place="bottom" xml:id="ftn5">
<p rendition="simple:footnote:text">Ahoj 2</p>
</note>
</cell>
</row>
</table>
<table>
<title>Tabule k rozčicům</title>
<tgroup cols="4">
<colspec colname="coll_1"/>
<colspec colname="coll_2"/>
<colspec colname="coll_3"/>
<colspec colname="coll_4"/>
<thead>
<row>
<entry>půlpik</entry>
<entry>dolot</entry>
<entry>tujta</entry>
<entry>xorosol</entry>
</row>
</thead>
<tbody>
<row>
<entry nameend="coll_3" namest="coll_1">nikdy</entry>
<entry>bodok
<footnoteForInDesign>@foot_beg@
<note place="bottom" xml:id="ftn3">
<pnote rendition="simple:footnote:text">Ahoj 2</pnote>
</note>
@foot_end@</footnoteForInDesign>
</entry>
</row>
<row>
<entry morerows="2">jednou</entry>
<entry>jutoj</entry>
<entry>bodok
<footnoteForInDesign>@foot_beg@
<note place="bottom" xml:id="ftn4">
<pnote rendition="simple:footnote:text">Ahoj 3</pnote>
</note>
@foot_end@</footnoteForInDesign>
</entry>
</row>
<row>
<entry>jednou</entry>
<entry>nikdy</entry>
<entry>bodok
<footnoteForInDesign>@foot_beg@
<note place="bottom" xml:id="ftn5">
<pnote rendition="simple:footnote:text">Ahoj 2</pnote>
</note>
@foot_end@</footnoteForInDesign>
</entry>
</row>
</tbody>
</tgroup>
</table>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment