mmulich/jod-service.rst

## jod-service.rst

      
    Raw
  

              jod-service.rst
            
          
    JOD Service

JOD (Java Open Document converter) is an alternative to the *office
headless mode that we wish to use in production. In the past the
*office headless mode was used, but it is not ideal and not a suitable
solution for concurrent builds. It will remain available for
developers to use as a feasible alternative to the heavy handed JOD
service.
The JOD service was originally built into OERPub's SWORD
implementation. Connexions came across it as a viable alternative to
issues raised by the developers that were not running GUI environments
which enabled a reliable headless *office to run. At this time it was
also realized that *office headless mode could bottleneck with
concurrent tasks, which is something the JOD service already solved.
The JOD service will give us the ability to do concurrent builds and have
the conversion as a service.

Installation

These instructions will assume you are using an Ubuntu system, because
this would turn into a small book if we were to include instructions
for every platform. Make your best effort to translate these
instructures to your platform and feel free to ask the developers for
assistance.

Installing MS Word -> *office dependencies

A *office installation will be required (LibreOffice, OpenOffice or
StarOffice). You are free to choose whichever, but this documentation
will use LibreOffice. You can adapt your environment to whichever you choose.
Installation of LibreOffice can be done using the following command:
$ sudo apt-get install libreoffice

A macro
needs to be added to the into *office for the transform to be
successful. (see also: oerpub.rhaptoslabs.swordpushweb)
$ wget https://raw.github.com/oerpub/oerpub.rhaptoslabs.swordpushweb/develop/docs/office_macro/Module1.xba
$ mkdir -p ~/.config/.libreoffice/3/user/basic/Standard/
$ mv Module1.xba ~/.config/.libreoffice/3/user/basic/Standard/


Installing Python-JOD

Python-JOD is a poorly named
project that provides an interfaces for the *office document
conversions.
Clone and navigate into the project:
$ git clone git clone https://github.com/oerpub/Python-JOD.git


Note
Adjust PIPE_PATH in install.sh to reflect whichever
flavor of *office you installed.

You'll need to install maven in order to run the install. On
ubuntu, this would be:
$ apt-get install openjdk-6-jdk maven

Then issue the installer command:
$ sudo JAVA_HOME=$(readlink -f /usr/bin/javac | sed "s:/bin/javac::") \
JRE_HOME=$(readlink -f /usr/bin/java | sed "s:/bin/java::") \
./install.sh

<https://github.com/oerpub/Python-JOD/blob/master/jodconverter-webapp-build/README.txt>`_

Usage

The '--dev-mode' flag will enable the use of the *office headless
mode, rather than the full blown JOD service. This is only to be used
when using this mode. We are using stderr for these messages because
standard out (stdout) is used for piping output.


## logging.rst

      
    Raw
  

              logging.rst
            
          
    Logging

Logging is done in two areas of this package: 1) at the library layer
and 2) at the commandline interface layer.

Library Logger

The library logger is available in cnxtransforms.reporting (package
name is reporting so that it doesn't conflict with the standard
library's logging module) as logger with the logger name as
cnxtransforms.
>>> from cnxtransforms.reporting import logger

This is the logger that should be used throughout the transformation
library functions.
The package provides a default logging configuration (in the package
as default_logging.cfg) which will log all info level messages to
standard error (stderr). The library user can customize the logging
configuration by creating their own at
/etc/cnx-transforms/logging.cfg or
~/.cnx-transforms/logging.cfg.

Note
Only one of these configuration files is used at a time and
therefore it may be useful to copy the one provided in this package
as a starting point.


Commandline Logger

The commandline logger is available in the cnxtransforms.cli
module as logger with the logger name as cnxtransforms.cli.
This logger is setup similar to the library-logger. It by default is
setup to report at the info level to stderr.


## notes.rst

      
    Raw
  

              notes.rst
            
          
String and Bytes interface

String inherits from io.StringIO with one major difference. It has a
name parameter, which makes it a named stringio buffer. This makes it
possible to take the buffer straight into a File without deliberately
specifying a name.
>>> from cnxtransforms import String
>>> str_buf = String(u"The dingo ate my baby!",
...                  name="dingo.poo")
>>> str_buf
<String instance of 'dingo.poo'>

Likewise, Bytes has the same named buffer interface, except that it
inherits from io.BytesIO rather than io.StringIO.
>>> from cnxtransforms import Bytes
>>> b_buf = Bytes("PK\x03\x04\x14\x00\x00\x08\x00\x00f\x1e\x8aB",
...               name="junk.bin")
>>> b_buf
<Bytes instance of 'junk.bin'>


File interface

File inherits from io.FileIO that has some context specific properties.
>>> from cnxtransforms import File
>>> from cnxtransforms import word_to_ooo

>>> address = 'localhost:2002'
>>> filepath = os.path.join('cnxtransforms', 'test-data', 'test-document.docx')
>>> file = File(filepath)
>>> file.filepath
'/mnt/hgfs/cnx-transforms/cnxtransforms/test-data/test-document.docx'
>>> file.filename
'test-document.docx'
>>> file.basepath
'/mnt/hgfs/cnx-transforms/cnxtransforms/test-data'

Call for output, which is also a File object.
>>> output = word_to_ooo(file, server_address=address)
>>> output.filepath
'/mnt/hgfs/cnx-transforms/cnxtransforms/test-data/test-document.docx.odt'

Default behavior will create an output object when an output isn't
passed into the function.
Files can be created on the fly from any IOBase object using the
from_io class method.
>>> import io
>>> io_value = io.BytesIO("PK\x03\x04\x14\x00\x00\x08\x00\x00f\x1e\x8aB^\xc62\x0c'\x00\x00\x00'\x00\x00\x00\x08\x00\x00\x00mimetypeapplication/vnd.oasis.opendocument.textPK\x03\x04\x14\x00\x00\x08\x00\x00f\x1e\x8aB\xeeY:\xc8\xd4\xee\x01\x00\xd4\xee\x01\x00-\x00\x00\x00Pictures/10000201000004AD0000020B937CE175.png\x89PNG\r\n")
>>> File.from_io(io_value)
<File instance of '/tmp/...'>


File Sequences interface

A FileSequence inhertis from collections.abc.MutableSequence. It
provides some specialization that will later allow us to do things
like:
>>> from cnxtransforms import to_zip_file
>>> zip = to_zipfile(file_sequence)
>>> zip
<ZipFile object at 0x...>

In the near term this is useful when a transform produces more than
one outcome. For example, the ODT to CNXML transform will split the
content and resources (e.g. images), which will result in more than
one output.
>>> from cnxtransforms import File
>>> file = File(os.path.join('cnxtransforms', 'test-data',
...                          'test-document.docx.odt'))
>>> from cnxtransforms import odt_to_cnxml
>>> cnxml = odt_to_cnxml(file)
>>> type(cnxml)
cnxtransforms.FileSequence
>>> cnxml
[<String instance of 'index.cnxml'>,
 <Bytes instance of 'Picture.png'>,
 <Bytes instance of 'Picture.jpg'>,
 <Bytes instance of 'graphics1.jpg'>]

This is also useful when working with zipfiles and other archives. The
following is an of instantiating a FileSequence using a
zipfile.Zipfile instance.
>>> import zipfile
>>> zfile = zipfile.ZipFile('html.zip')
>>> from cnxtransforms import FileSequence
>>> archive = FileSequence.from_zipfile(zfile)
>>> archive
[<String instance of 'index.html'>,
 <Bytes instance of 'graphic.jpg'>]


Command-line Interface

The command-line interface (CLI) is set up to behave similar to Docutils
conversion tools, where a file path input can be given and the mutually
inclusive file output can be given as well, but will default to
standard out (stdout). If the input comes from standard in (stdin),
the output must go to stdout.
Each command is set up to facilitate one transformation that can be
piped into another. For example:
$ cat test-document.docx | word2soffice | soffice2cnxml > content.zip
$ cnxml2html content.zip > html.zip
$ cat html.zip | html2cnxml > cnxml.zip