Skip to content

Instantly share code, notes, and snippets.

@karenc
Last active January 27, 2019 21:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save karenc/ea517a43462d71d8a2656b76bbc0ea02 to your computer and use it in GitHub Desktop.
Save karenc/ea517a43462d71d8a2656b76bbc0ea02 to your computer and use it in GitHub Desktop.

CNX Pipeline

Broadly speaking, there are 2 parts to generate pdfs:

  1. cnxml -> single html

    1. module cnxml files -> module html files
    2. module html files -> epub -> one single html file
  2. single html -> pdf

    1. single html -> cnx-recipes baked html
    2. cnx-recipes baked html -> baked html with svg / html math
    3. baked html with svg / html math -> pdf

Transforming module CNXML to module HTML

We have a trigger called add_module_file for when a file is added to the database (a new entry in the module_files table):

https://github.com/Connexions/cnx-db/blob/ecd8024579f6623faa9143b35c5c688e1fbbec13/cnxdb/archive-sql/schema/triggers.sql#L250

This calls cnxarchive.database.add_module_file:

https://github.com/Connexions/cnx-archive/blob/431084d7941038a8aaa3c37fadd82d6647add752/cnxarchive/database.py#L700

The part we care about is transforming index.cnxml to index.cnxml.html, the functions are cnxdb.triggers.transforms.converters.cnxml_to_full_html:

https://github.com/Connexions/cnx-db/blob/ecd8024579f6623faa9143b35c5c688e1fbbec13/cnxdb/triggers/transforms/converters.py#L107

and cnxdb.triggers.transforms.resolvers.CnxmlToHtmlReferenceResolver (may not be necessary):

https://github.com/Connexions/cnx-db/blob/ecd8024579f6623faa9143b35c5c688e1fbbec13/cnxdb/triggers/transforms/resolvers.py#L328

cnxml_to_full_html uses rhaptos.cnxmlutils to change CNXML to HTML by XSLT.

Transforming module HTML to single HTML

The single HTML script is in cnx-epub:

https://github.com/Connexions/cnx-epub/blob/51096c21f1f492be7285c3493c14147a5c17d074/cnxepub/scripts/single_html/main.py

This script takes an epub as input and outputs a single HTML file.

We need to create an epub from the module HTML files and COLLXML file (which has the book structure) using the models in cnxepub:

https://github.com/Connexions/cnx-epub/blob/51096c21f1f492be7285c3493c14147a5c17d074/cnxepub/models.py#L376

and create an epub with cnxepub.adapters.make_epub:

https://github.com/Connexions/cnx-epub/blob/51096c21f1f492be7285c3493c14147a5c17d074/cnxepub/adapters.py#L94

Generating PDFs from single HTML

Everything is already in place, Kevin and I have been working on getting the repos on docker so it is easier to install locally.

https://github.com/kevinburleigh75/docker-expers

and

https://github.com/karenc/docker-expers/tree/karen

We still need a script to control the different containers to get a PDF out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment