We want to use pdftk
, a binary cli program to generate PDF files.
https://www.pdflabs.com/tools/pdftk-server/
We don't have access to apt-get at runtime on Modulus, so we can't just apt-get install pdftk
. So, can we compile from source and build a binary that we can ship alongside the meteor source code?
To avoid weird toolchain/cross-compilation issues, let's do the build in the same environment as it will run. Handily modulus supplies the DOCKERFILE
s for the images we run inside of:
- http://blog.modulus.io/open-sourcing-our-docker-images
- https://github.com/onmodulus/docker-images
- https://github.com/onmodulus/docker-run-base
To get going first setup a lightweight vm (tiny core linux on virtualbox) that can act as the Docker Host (OS X can't do this). http://boot2docker.io/
brew install boot2docker
Follow the brew output to get the vm and the Docker daemon up and running. Once live, let's build the modulus images.
Also install the Docker client, this can run natively on OS X since it just talks to the daemon.
brew install docker
There are multiple images involved, from general to specific:
- https://github.com/phusion/baseimage-docker
- https://github.com/onmodulus/docker-base
- https://github.com/onmodulus/docker-run-base
- https://github.com/onmodulus/docker-run-node
baseimage-docker
is in the public registry, and will be pulled in automatically. For all of the onmodulus images, we'll need to build and register them locally. Clone each onmoudlus repo, cd in, and in the order above run
docker build -t <name of image> .
This will build the image and can take awhile. The -t <name of image>
will also tag the output and register it locally so that we can run it and build descendant images from it.
After each step you can check what images you have via docker images
.
Once all images are built, we need to setup the environment with the assumptions that modulus makes. Notably we need to mount a filesystem as described
/mnt
The volume mounted at /mnt requires the follow subdirectories to be created by the host system and accessible by the mop user/group.
/mnt/tmp Temporary storage. The TEMP_DIR environment variable is defined to here.
/mnt/home The mop user's home directory. The HOME environment variable is defined to here.
/mnt/log Application stdout/stderr is placed in this directory with the filename app.log.
/mnt/app The application itself is placed in this directory.
/mnt/notifications Crash and other notifications, generated by supervisor, are placed here.
/mnt/app-storage Persistent storage is mounted here. It's also mounted to /app-storage at runtime.
/mnt/supervisor.conf The supervisor daemon is run with this configuration file.
You can put this directory wherever, just adjust the -v
argument to specifiy where the volume lives when running docker
. I punted on permissions setup and just made everything 777
.
egoldblum@Ethans-MacBook-Pro(15:56:31):~$ ls -l host-folder/
total 0
drwxrwxrwx 2 egoldblum staff 68 Jul 23 12:19 app/
drwxrwxrwx 2 egoldblum staff 68 Jul 23 12:19 home/
drwxrwxrwx 3 egoldblum staff 102 Jul 23 12:23 log/
-rwxrwxrwx 1 egoldblum staff 0 Jul 23 12:19 supervisor.conf*
drwxrwxrwx 2 egoldblum staff 68 Jul 23 15:19 tmp/
Run the image, allocating a tty, mounting the volume to match where you created it locally, mapping port 80 inside to 8080 outside, and dropping into a bash shell. --rm
will remove an existing container, if present
docker run --rm -v ~/host-folder:/mnt -p 80:8080 -t -i onmodulus/docker-run-node:0.0.1 /sbin/my_init -- bash -l
If it worked you should be sitting at a bash prompt as root inside the container. Setup is done, let's build stuff.
Let's see what packages we have installed
apt --installed list
https://gist.github.com/egoldblum/9ec942849ea5424f52aa
There's a lot, but not everything we need to build pdftk
from source according to http://packages.ubuntu.com/trusty/pdftk
Get the source while we're at it
wget https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/pdftk-2.02-src.zip
unzip pdftk-2.02-src.zip
Install the dependencies for building
apt-get install libgcj14
apt-get install gcj-jdk
Let's check our toolchain versions
root@791899f6f4bc:/# gcj --version | head -1
gcj (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
root@791899f6f4bc:/# gcc --version | head -1
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
Looks like we're using 4.8 so let's make the Makefile aware of our toolchain. We're in an Ubuntu image, so use Makefile.Debian
export VERSUFF=-4.8
Go build it
make -f Makefile.Debian
After awhile...
root@791899f6f4bc:/pdftk-2.02-dist/pdftk# file pdftk
pdftk: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=ef4e19fbf402e8d8cd32bdae1ade018f3ff551b5, not stripped
root@791899f6f4bc:/pdftk-2.02-dist/pdftk# ./pdftk --version
pdftk 2.02 a Handy Tool for Manipulating PDF Documents
Copyright (c) 2003-13 Steward and Lee, LLC - Please Visit: www.pdftk.com
This is free software; see the source code for copying conditions. There is
NO warranty, not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
We didn't adjust any compile/link flags yet, so let's see what libraries this thing is using
root@791899f6f4bc:/pdftk-2.02-dist/pdftk# ldd pdftk
linux-vdso.so.1 => (0x00007ffc839b1000)
libgcj.so.14 => /usr/lib/x86_64-linux-gnu/libgcj.so.14 (0x00007fa8389ac000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa8386a8000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa838492000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa8380cd000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa837eaf000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa837ca7000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa837aa3000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fa83788a000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa83bafc000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa837584000)
Looks like a lot of them, including libgcj.so.14
which we had to install manually. This means that this resulting binary won't run on the unmodified modulus image since the dynamic library doesn't exist there.
Since all these are dynamically linked, the resulting binary is quite reasonable at 4mb.
root@791899f6f4bc:/pdftk-2.02-dist/pdftk# ls -lah pdftk
-rwxr-xr-x 1 root root 3.9M Jul 23 20:27 pdftk
So, can we figure out a way to statically compile & link all dependencies into one (fat) binary that we can execute on modulus?
Let's add some flags to Makefile.Debian
to instruct all of the tools to build/link statically.
export CPPFLAGS= -DPATH_DELIM=0x2f -DASK_ABOUT_WARNINGS=false -DUNBLOCK_SIGNALS -fdollars-in-identifiers -static
export CXXFLAGS= -Wall -Wextra -Weffc++ -O2 -static
export GCJFLAGS= -fsource=1.3 -O2 -static-libgcj
export GCJHFLAGS= -force
export LDLIBS= -lgcj
Most importantly, this tells gcj
to use a static version of libgcj
https://gcc.gnu.org/wiki/Statically_linking_libgcj
Clean and build again, and make
complains. uh-oh.
root@791899f6f4bc:/pdftk-2.02-dist/pdftk# make clean -f Makefile.Debian > /dev/null
root@791899f6f4bc:/pdftk-2.02-dist/pdftk# make -f Makefile.Debian
<snip>
make -f Makefile -iC /pdftk-2.02-dist/pdftk/../java all
make[1]: Entering directory `/pdftk-2.02-dist/java'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/pdftk-2.02-dist/java'
g++-4.8 -Wall -Wextra -Weffc++ -O2 -static attachments.o report.o passwords.o pdftk.o /pdftk-2.02-dist/pdftk/../java/java_lib.o -lgcj -o pdftk
/usr/bin/ld: cannot find -lgcj
collect2: error: ld returned 1 exit status
make: *** [pdftk] Error 1
Where is the libgcj.a
static archive to link against? Turns out that that gcj
doesn't ship with it since it can be buggy/error-prone. True on Ubuntu and Red Hat at least.
https://bugzilla.redhat.com/show_bug.cgi?id=1004507#c1
Statically linking gcj doesn't really work, which is why we are intentionally not shipping libgcj.a.
If you want to compile/link programs that don't depend on particular libgcj.so version, use -findirect-dispatch (both for compilation and linking).
If you don't want the executable to depend on libgcj, you can prepend -static-libgcj to the gcj command-line, but that won't work with the stock gcj package on Ubuntu Lucid, because libgcj.a was not included in the package. However, if you compile your own GCC (and enable Java), that will support -static-libgcj .
So we don't have an archive to link against.
Some german guy apparently got this working with an older toolchain and a compiler that includes libgcj.a
http://dokupuppylinux.info/programs:pdf_manipulation#pdftk_141_statically_linked
We may be able to compile libgcj
into a static archive ourselves. To be continued??