Created
January 4, 2010 14:47
-
-
Save psd/268550 to your computer and use it in GitHub Desktop.
OpenOffice Document Conversion
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Experiments in running a headless OpenOffice as a document convertor for TiddlyDocs, etc. | |
Running OpenOffice Headless: | |
$ cd /Applications/OpenOffice.org.app/Contents/program #Mac OSX | |
$ cd /usr/lib/openoffice.org.app/program #CentOS | |
$ ./soffice.bin -headless -invisible -nofirststartwizard -accept="socket,port=8100;urp;" | |
init script: | |
see openoffice.sh xvfb.sh for init.d scripts | |
uses Xvfb for virtual X11 DISPLAY: | |
$ yum install xorg-x11-fonts* | |
$ yum install Xvfb | |
http://www.oooforum.org/forum/viewtopic.phtml?t=11890 | |
Art of Solving Java Open Office document conversion client: | |
http://www.artofsolving.com/opensource/jodconverter | |
Running from command line: | |
$ java -jar jodconverter-cli-2.2.2.jar <input-document> <output-document> | |
.. expands to multiple files, doesn't handle relative directories well | |
Running as Tomcat: | |
bin/startup.sh | |
http://localhost:8080/converter/ | |
$ wget http://localhost:8080/converter/service \ | |
--post-file=document.odt \ | |
--header="Content-Type: application/vnd.oasis.opendocument.text" \ | |
--header="Accept: application/pdf" \ | |
--output-document=document.pdf | |
.. doesn't handle well document exploding to generate HTML plus images | |
Sun Wiki Publisher Extension: | |
http://extensions.services.openoffice.org/project/wikipublisher?intcmp=1547 | |
Source: | |
http://sw.openoffice.org/source/browse/sw/swext/mediawiki/ | |
Python: | |
http://wiki.services.openoffice.org/wiki/Python | |
$ export DYLD_LIBRARY_PATH="/Applications/OpenOffice.org.app/Contents/program/" | |
os.system('DYLD_LIBRARY_PATH="/Applications/OpenOffice.org.app/Contents/program/" /usr/bin/python2.3 ooextract.py --pdf test.odt') | |
http://qa.openoffice.org/issues/long_list.cgi?issuelist=93084 | |
""" | |
For Mac OS 10.4 (Tiger) the default python 2.3 interpreter works fine. The default python 2.5 interpreter that came with Mac OS 10.5 (Leopard) gave me the following error: | |
Fatal Python error: Interpreter not initialized (version mismatch?) | |
Abort trap | |
I tried running the script with python versions 2.3.4, 2.3.5, and 2.3.7; they all failed with the same error. | |
""" | |
http://reidransom.com/geek/scripting-openoffice-org-app-with-python-on-mac-os-x/ | |
http://udk.openoffice.org/python/python-bridge.html#replacing | |
http://udk.openoffice.org/python/python-bridge.html | |
""" | |
UnoConv: | |
http://dag.wieers.com/home-made/unoconv/ | |
unoconv converts between any document format that OpenOffice understands. It uses OpenOffice's UNO bindings for non-interactive conversion of documents. | |
Supported document formats include Open Document Format (.odt), MS Word (.doc), MS Office Open/MS OOXML (.xml), Portable Document Format (.pdf), HTML, XHTML, RTF, Docbook (.xml), and more. | |
""" | |
Building Python from source for Mac OSX: | |
Hacked pyconfig.h: | |
#undef _POSIX_C_SOURCE | |
#undef _XOPEN_SOURCE | |
#define HAVE_BROKEN_POSIX_SEMAPHORES | |
Hacked Makefile: | |
prefix= /System/Library/Frameworks/Python.framework/Versions/2.3.4 | |
and removed "-u __dummy" from LINKFORSHARED line | |
Hacked ..openoffice.org .. basis-link/program/uno.py: | |
import sys | |
if sys.platform == 'darwin': | |
# make sure libpyuno.dylib is found | |
import os | |
newpath = os.path.split( __file__ )[0] | |
cwd = os.getcwd() | |
os.chdir( newpath ) | |
import pyuno | |
os.chdir( cwd ) | |
else: | |
import pyuno | |
import __builtin__ | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# | |
# PyODConverter (Python OpenDocument Converter) v1.1 - 2009-11-14 | |
# | |
# This script converts a document from one office format to another by | |
# connecting to an OpenOffice.org instance via Python-UNO bridge. | |
# | |
# Copyright (C) 2008-2009 Mirko Nasato <mirko@artofsolving.com> | |
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl-2.1.html | |
# - or any later version. | |
# | |
DEFAULT_OPENOFFICE_PORT = 8100 | |
import uno | |
from os.path import abspath, isfile, splitext | |
from com.sun.star.beans import PropertyValue | |
from com.sun.star.task import ErrorCodeIOException | |
from com.sun.star.connection import NoConnectException | |
FAMILY_TEXT = "Text" | |
FAMILY_WEB = "Web" | |
FAMILY_SPREADSHEET = "Spreadsheet" | |
FAMILY_PRESENTATION = "Presentation" | |
FAMILY_DRAWING = "Drawing" | |
#---------------------# | |
# Configuration Start # | |
#---------------------# | |
# see http://wiki.services.openoffice.org/wiki/Framework/Article/Filter | |
# most formats are auto-detected; only those requiring options are defined here | |
IMPORT_FILTER_MAP = { | |
"txt": { | |
"FilterName": "Text (encoded)", | |
"FilterOptions": "utf8" | |
}, | |
"csv": { | |
"FilterName": "Text - txt - csv (StarCalc)", | |
"FilterOptions": "44,34,0" | |
} | |
} | |
EXPORT_FILTER_MAP = { | |
"pdf": { | |
FAMILY_TEXT: { "FilterName": "writer_pdf_Export" }, | |
FAMILY_WEB: { "FilterName": "writer_web_pdf_Export" }, | |
FAMILY_SPREADSHEET: { "FilterName": "calc_pdf_Export" }, | |
FAMILY_PRESENTATION: { "FilterName": "impress_pdf_Export" }, | |
FAMILY_DRAWING: { "FilterName": "draw_pdf_Export" } | |
}, | |
"html": { | |
FAMILY_TEXT: { "FilterName": "HTML (StarWriter)" }, | |
FAMILY_SPREADSHEET: { "FilterName": "HTML (StarCalc)" }, | |
FAMILY_PRESENTATION: { "FilterName": "impress_html_Export" } | |
}, | |
"odt": { | |
FAMILY_TEXT: { "FilterName": "writer8" }, | |
FAMILY_WEB: { "FilterName": "writerweb8_writer" } | |
}, | |
"doc": { | |
FAMILY_TEXT: { "FilterName": "MS Word 97" } | |
}, | |
"rtf": { | |
FAMILY_TEXT: { "FilterName": "Rich Text Format" } | |
}, | |
"txt": { | |
FAMILY_TEXT: { | |
"FilterName": "Text", | |
"FilterOptions": "utf8" | |
} | |
}, | |
"ods": { | |
FAMILY_SPREADSHEET: { "FilterName": "calc8" } | |
}, | |
"xls": { | |
FAMILY_SPREADSHEET: { "FilterName": "MS Excel 97" } | |
}, | |
"csv": { | |
FAMILY_SPREADSHEET: { | |
"FilterName": "Text - txt - csv (StarCalc)", | |
"FilterOptions": "44,34,0" | |
} | |
}, | |
"odp": { | |
FAMILY_PRESENTATION: { "FilterName": "impress8" } | |
}, | |
"ppt": { | |
FAMILY_PRESENTATION: { "FilterName": "MS PowerPoint 97" } | |
}, | |
"swf": { | |
FAMILY_DRAWING: { "FilterName": "draw_flash_Export" }, | |
FAMILY_PRESENTATION: { "FilterName": "impress_flash_Export" } | |
} | |
} | |
PAGE_STYLE_OVERRIDE_PROPERTIES = { | |
FAMILY_SPREADSHEET: { | |
#--- Scale options: uncomment 1 of the 3 --- | |
# a) 'Reduce / enlarge printout': 'Scaling factor' | |
"PageScale": 100, | |
# b) 'Fit print range(s) to width / height': 'Width in pages' and 'Height in pages' | |
#"ScaleToPagesX": 1, "ScaleToPagesY": 1000, | |
# c) 'Fit print range(s) on number of pages': 'Fit print range(s) on number of pages' | |
#"ScaleToPages": 1, | |
"PrintGrid": False | |
} | |
} | |
#-------------------# | |
# Configuration End # | |
#-------------------# | |
class DocumentConversionException(Exception): | |
def __init__(self, message): | |
self.message = message | |
def __str__(self): | |
return self.message | |
class DocumentConverter: | |
def __init__(self, port=DEFAULT_OPENOFFICE_PORT): | |
localContext = uno.getComponentContext() | |
resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext) | |
try: | |
context = resolver.resolve("uno:socket,host=localhost,port=%s;urp;StarOffice.ComponentContext" % port) | |
except NoConnectException: | |
raise DocumentConversionException, "failed to connect to OpenOffice.org on port %s" % port | |
self.desktop = context.ServiceManager.createInstanceWithContext("com.sun.star.frame.Desktop", context) | |
def convert(self, inputFile, outputFile): | |
inputUrl = self._toFileUrl(inputFile) | |
outputUrl = self._toFileUrl(outputFile) | |
loadProperties = { "Hidden": True } | |
inputExt = self._getFileExt(inputFile) | |
if IMPORT_FILTER_MAP.has_key(inputExt): | |
loadProperties.update(IMPORT_FILTER_MAP[inputExt]) | |
document = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, self._toProperties(loadProperties)) | |
try: | |
document.refresh() | |
except AttributeError: | |
pass | |
family = self._detectFamily(document) | |
self._overridePageStyleProperties(document, family) | |
outputExt = self._getFileExt(outputFile) | |
storeProperties = self._getStoreProperties(document, outputExt) | |
try: | |
document.storeToURL(outputUrl, self._toProperties(storeProperties)) | |
finally: | |
document.close(True) | |
def _overridePageStyleProperties(self, document, family): | |
if PAGE_STYLE_OVERRIDE_PROPERTIES.has_key(family): | |
properties = PAGE_STYLE_OVERRIDE_PROPERTIES[family] | |
pageStyles = document.getStyleFamilies().getByName('PageStyles') | |
for styleName in pageStyles.getElementNames(): | |
pageStyle = pageStyles.getByName(styleName) | |
for name, value in properties.items(): | |
pageStyle.setPropertyValue(name, value) | |
def _getStoreProperties(self, document, outputExt): | |
family = self._detectFamily(document) | |
try: | |
propertiesByFamily = EXPORT_FILTER_MAP[outputExt] | |
except KeyError: | |
raise DocumentConversionException, "unknown output format: '%s'" % outputExt | |
try: | |
return propertiesByFamily[family] | |
except KeyError: | |
raise DocumentConversionException, "unsupported conversion: from '%s' to '%s'" % (family, outputExt) | |
def _detectFamily(self, document): | |
if document.supportsService("com.sun.star.text.WebDocument"): | |
return FAMILY_WEB | |
if document.supportsService("com.sun.star.text.GenericTextDocument"): | |
# must be TextDocument or GlobalDocument | |
return FAMILY_TEXT | |
if document.supportsService("com.sun.star.sheet.SpreadsheetDocument"): | |
return FAMILY_SPREADSHEET | |
if document.supportsService("com.sun.star.presentation.PresentationDocument"): | |
return FAMILY_PRESENTATION | |
if document.supportsService("com.sun.star.drawing.DrawingDocument"): | |
return FAMILY_DRAWING | |
raise DocumentConversionException, "unknown document family: %s" % document | |
def _getFileExt(self, path): | |
ext = splitext(path)[1] | |
if ext is not None: | |
return ext[1:].lower() | |
def _toFileUrl(self, path): | |
return uno.systemPathToFileUrl(abspath(path)) | |
def _toProperties(self, dict): | |
props = [] | |
for key in dict: | |
prop = PropertyValue() | |
prop.Name = key | |
prop.Value = dict[key] | |
props.append(prop) | |
return tuple(props) | |
if __name__ == "__main__": | |
from sys import argv, exit | |
if len(argv) < 3: | |
print "USAGE: python %s <input-file> <output-file>" % argv[0] | |
exit(255) | |
if not isfile(argv[1]): | |
print "no such input file: %s" % argv[1] | |
exit(1) | |
try: | |
converter = DocumentConverter() | |
converter.convert(argv[1], argv[2]) | |
except DocumentConversionException, exception: | |
print "ERROR! " + str(exception) | |
exit(1) | |
except ErrorCodeIOException, exception: | |
print "ERROR! ErrorCodeIOException %d" % exception.ErrCode | |
exit(1) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# OpenOffice utils. | |
# | |
# Based on code from: | |
# PyODConverter (Python OpenDocument Converter) v1.0.0 - 2008-05-05 | |
# Copyright (C) 2008 Mirko Nasato <mirko@artofsolving.com> | |
# Licensed under the GNU LGPL v2.1 - or any later version. | |
# http://www.gnu.org/licenses/lgpl-2.1.html | |
# | |
import sys | |
import os | |
import time | |
import atexit | |
OPENOFFICE_PORT = 8100 | |
# Find OpenOffice. | |
_oopaths=( | |
('/usr/lib64/ooo-2.0/program', '/usr/lib64/ooo-2.0/program'), | |
('/opt/openoffice.org3/program', '/opt/openoffice.org/basis3.0/program'), | |
) | |
for p in _oopaths: | |
if os.path.exists(p[0]): | |
OPENOFFICE_PATH = p[0] | |
OPENOFFICE_BIN = os.path.join(OPENOFFICE_PATH, 'soffice') | |
OPENOFFICE_LIBPATH = p[1] | |
# Add to path so we can find uno. | |
if sys.path.count(OPENOFFICE_LIBPATH) == 0: | |
sys.path.insert(0, OPENOFFICE_LIBPATH) | |
break | |
import uno | |
from com.sun.star.beans import PropertyValue | |
from com.sun.star.connection import NoConnectException | |
class OORunner: | |
""" | |
Start, stop, and connect to OpenOffice. | |
""" | |
def __init__(self, port=OPENOFFICE_PORT): | |
""" Create OORunner that connects on the specified port. """ | |
self.port = port | |
def connect(self, no_startup=False): | |
""" | |
Connect to OpenOffice. | |
If a connection cannot be established try to start OpenOffice. | |
""" | |
localContext = uno.getComponentContext() | |
resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext) | |
context = None | |
did_start = False | |
n = 0 | |
while n < 6: | |
try: | |
context = resolver.resolve("uno:socket,host=localhost,port=%d;urp;StarOffice.ComponentContext" % self.port) | |
break | |
except NoConnectException: | |
pass | |
# If first connect failed then try starting OpenOffice. | |
if n == 0: | |
# Exit loop if startup not desired. | |
if no_startup: | |
break | |
self.startup() | |
did_start = True | |
# Pause and try again to connect | |
time.sleep(1) | |
n += 1 | |
if not context: | |
raise Exception, "Failed to connect to OpenOffice on port %d" % self.port | |
desktop = context.ServiceManager.createInstanceWithContext("com.sun.star.frame.Desktop", context) | |
if not desktop: | |
raise Exception, "Failed to create OpenOffice desktop on port %d" % self.port | |
if did_start: | |
_started_desktops[self.port] = desktop | |
return desktop | |
def startup(self): | |
""" | |
Start a headless instance of OpenOffice. | |
""" | |
args = [OPENOFFICE_BIN, | |
'-accept=socket,host=localhost,port=%d;urp;StarOffice.ServiceManager' % self.port, | |
'-norestore', | |
'-nofirststartwizard', | |
'-nologo', | |
'-headless', | |
] | |
env = {'PATH' : '/bin:/usr/bin:%s' % OPENOFFICE_PATH, | |
'PYTHONPATH' : OPENOFFICE_LIBPATH, | |
} | |
try: | |
pid = os.spawnve(os.P_NOWAIT, args[0], args, env) | |
except Exception, e: | |
raise Exception, "Failed to start OpenOffice on port %d: %s" % (self.port, e.message) | |
if pid <= 0: | |
raise Exception, "Failed to start OpenOffice on port %d" % self.port | |
def shutdown(self): | |
""" | |
Shutdown OpenOffice. | |
""" | |
try: | |
if _started_desktops.get(self.port): | |
_started_desktops[self.port].terminate() | |
del _started_desktops[self.port] | |
except Exception, e: | |
pass | |
# Keep track of started desktops and shut them down on exit. | |
_started_desktops = {} | |
def _shutdown_desktops(): | |
""" Shutdown all OpenOffice desktops that were started by the program. """ | |
for port, desktop in _started_desktops.items(): | |
try: | |
if desktop: | |
desktop.terminate() | |
except Exception, e: | |
pass | |
atexit.register(_shutdown_desktops) | |
def oo_shutdown_if_running(port=OPENOFFICE_PORT): | |
""" Shutdown OpenOffice if it's running on the specified port. """ | |
oorunner = OORunner(port) | |
try: | |
desktop = oorunner.connect(no_startup=True) | |
desktop.terminate() | |
except Exception, e: | |
pass | |
def oo_properties(**args): | |
""" | |
Convert args to OpenOffice property values. | |
""" | |
props = [] | |
for key in args: | |
prop = PropertyValue() | |
prop.Name = key | |
prop.Value = args[key] | |
props.append(prop) | |
return tuple(props) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
OOo_HOME=/usr/bin | |
SOFFICE_PATH=$OOo_HOME/soffice | |
PIDFILE=/var/run/openoffice-server.pid | |
set -e | |
case "$1" in | |
start) | |
if [ -f $PIDFILE ]; then | |
echo "OpenOffice headless server has already started." | |
sleep 5 | |
exit | |
fi | |
echo "Starting OpenOffice headless server" | |
$SOFFICE_PATH -display :1 -headless -nologo -nofirststartwizard -accept="socket,host=127.0.0.1,port=8100;urp" & > /dev/null 2>&1 | |
touch $PIDFILE | |
;; | |
stop) | |
if [ -f $PIDFILE ]; then | |
echo "Stopping OpenOffice headless server." | |
killall -9 soffice && killall -9 soffice.bin | |
rm -f $PIDFILE | |
exit | |
fi | |
echo "Openoffice headless server is not running." | |
exit | |
;; | |
*) | |
echo "Usage: $0 {start|stop}" | |
exit 1 | |
esac | |
exit 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
xvfb_start() { | |
if [ -x /usr/bin/Xvfb ]; then | |
echo "Starting Virtual Frame Buffer X Server (Xvfb) as local display :1.0" | |
echo " /usr/bin/Xvfb :1 -screen 0 800x600x16 -fbdir /usr/src" | |
/usr/bin/Xvfb :1 -screen 0 800x600x16 -fbdir /usr/src & | |
else | |
echo "Error: Could not find /usr/bin/Xvfb. Cannot start Xvfb." | |
fi | |
} | |
xvfb_stop() { | |
if [ -x /usr/bin/killall ]; then | |
echo "Stopping Virtual Frame Buffer X Server (Xvfb) for local display :1.0" | |
/usr/bin/killall Xvfb 2> /dev/null | |
else | |
echo "Error: Could not find /usr/bin/killall. Cannot stop Xvfb." | |
fi | |
} | |
case "$1" in | |
'start') | |
xvfb_start | |
;; | |
'stop') | |
xvfb_stop | |
;; | |
'restart') | |
xvfb_stop | |
sleep 1 | |
xvfb_start | |
;; | |
*) | |
if [ -x /usr/bin/basename ]; then | |
echo "usage: `/usr/bin/basename $0` start|stop|restart" | |
else | |
echo "usage: $0 start|stop|restart" | |
fi | |
esac |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment