Skip to content

Instantly share code, notes, and snippets.

@egoldblum
Last active August 29, 2015 14:25
Show Gist options
  • Save egoldblum/d7b955cdcacbb75bd7b2 to your computer and use it in GitHub Desktop.
Save egoldblum/d7b955cdcacbb75bd7b2 to your computer and use it in GitHub Desktop.

We want to use pdftk, a binary cli program to generate PDF files. https://www.pdflabs.com/tools/pdftk-server/

We don't have access to apt-get at runtime on Modulus, so we can't just apt-get install pdftk. So, can we compile from source and build a binary that we can ship alongside the meteor source code?

To avoid weird toolchain/cross-compilation issues, let's do the build in the same environment as it will run. Handily modulus supplies the DOCKERFILEs for the images we run inside of:

To get going first setup a lightweight vm (tiny core linux on virtualbox) that can act as the Docker Host (OS X can't do this). http://boot2docker.io/

brew install boot2docker

Follow the brew output to get the vm and the Docker daemon up and running. Once live, let's build the modulus images.

Also install the Docker client, this can run natively on OS X since it just talks to the daemon.

brew install docker

There are multiple images involved, from general to specific:

baseimage-docker is in the public registry, and will be pulled in automatically. For all of the onmodulus images, we'll need to build and register them locally. Clone each onmoudlus repo, cd in, and in the order above run

docker build -t <name of image> .

This will build the image and can take awhile. The -t <name of image> will also tag the output and register it locally so that we can run it and build descendant images from it.

After each step you can check what images you have via docker images.

Once all images are built, we need to setup the environment with the assumptions that modulus makes. Notably we need to mount a filesystem as described

/mnt

The volume mounted at /mnt requires the follow subdirectories to be created by the host system and accessible by the mop user/group.

/mnt/tmp Temporary storage. The TEMP_DIR environment variable is defined to here.
/mnt/home The mop user's home directory. The HOME environment variable is defined to here.
/mnt/log Application stdout/stderr is placed in this directory with the filename app.log.
/mnt/app The application itself is placed in this directory.
/mnt/notifications Crash and other notifications, generated by supervisor, are placed here.
/mnt/app-storage Persistent storage is mounted here. It's also mounted to /app-storage at runtime.
/mnt/supervisor.conf The supervisor daemon is run with this configuration file.

You can put this directory wherever, just adjust the -v argument to specifiy where the volume lives when running docker. I punted on permissions setup and just made everything 777.

egoldblum@Ethans-MacBook-Pro(15:56:31):~$ ls -l host-folder/
total 0
drwxrwxrwx  2 egoldblum  staff   68 Jul 23 12:19 app/
drwxrwxrwx  2 egoldblum  staff   68 Jul 23 12:19 home/
drwxrwxrwx  3 egoldblum  staff  102 Jul 23 12:23 log/
-rwxrwxrwx  1 egoldblum  staff    0 Jul 23 12:19 supervisor.conf*
drwxrwxrwx  2 egoldblum  staff   68 Jul 23 15:19 tmp/

Run the image, allocating a tty, mounting the volume to match where you created it locally, mapping port 80 inside to 8080 outside, and dropping into a bash shell. --rm will remove an existing container, if present

docker run --rm -v ~/host-folder:/mnt -p 80:8080 -t -i onmodulus/docker-run-node:0.0.1 /sbin/my_init -- bash -l

If it worked you should be sitting at a bash prompt as root inside the container. Setup is done, let's build stuff.

Let's see what packages we have installed

apt --installed list

https://gist.github.com/egoldblum/9ec942849ea5424f52aa

There's a lot, but not everything we need to build pdftk from source according to http://packages.ubuntu.com/trusty/pdftk

Get the source while we're at it

wget https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/pdftk-2.02-src.zip

unzip pdftk-2.02-src.zip

Install the dependencies for building

apt-get install libgcj14

apt-get install gcj-jdk

Let's check our toolchain versions

root@791899f6f4bc:/# gcj --version | head -1
gcj (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
root@791899f6f4bc:/# gcc --version | head -1
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4

Looks like we're using 4.8 so let's make the Makefile aware of our toolchain. We're in an Ubuntu image, so use Makefile.Debian

export VERSUFF=-4.8

Go build it

make -f Makefile.Debian

After awhile...

root@791899f6f4bc:/pdftk-2.02-dist/pdftk# file pdftk
pdftk: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=ef4e19fbf402e8d8cd32bdae1ade018f3ff551b5, not stripped

root@791899f6f4bc:/pdftk-2.02-dist/pdftk# ./pdftk --version

pdftk 2.02 a Handy Tool for Manipulating PDF Documents
Copyright (c) 2003-13 Steward and Lee, LLC - Please Visit: www.pdftk.com
This is free software; see the source code for copying conditions. There is
NO warranty, not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

We didn't adjust any compile/link flags yet, so let's see what libraries this thing is using

root@791899f6f4bc:/pdftk-2.02-dist/pdftk# ldd pdftk
linux-vdso.so.1 =>  (0x00007ffc839b1000)
libgcj.so.14 => /usr/lib/x86_64-linux-gnu/libgcj.so.14 (0x00007fa8389ac000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa8386a8000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa838492000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa8380cd000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa837eaf000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa837ca7000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa837aa3000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fa83788a000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa83bafc000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa837584000)

Looks like a lot of them, including libgcj.so.14 which we had to install manually. This means that this resulting binary won't run on the unmodified modulus image since the dynamic library doesn't exist there.

Since all these are dynamically linked, the resulting binary is quite reasonable at 4mb.

root@791899f6f4bc:/pdftk-2.02-dist/pdftk# ls -lah pdftk
-rwxr-xr-x 1 root root 3.9M Jul 23 20:27 pdftk

So, can we figure out a way to statically compile & link all dependencies into one (fat) binary that we can execute on modulus?

Let's add some flags to Makefile.Debian to instruct all of the tools to build/link statically.

export CPPFLAGS= -DPATH_DELIM=0x2f -DASK_ABOUT_WARNINGS=false -DUNBLOCK_SIGNALS -fdollars-in-identifiers -static
export CXXFLAGS= -Wall -Wextra -Weffc++ -O2 -static
export GCJFLAGS= -fsource=1.3 -O2 -static-libgcj
export GCJHFLAGS= -force
export LDLIBS= -lgcj

Most importantly, this tells gcj to use a static version of libgcj

https://gcc.gnu.org/wiki/Statically_linking_libgcj

Clean and build again, and make complains. uh-oh.

root@791899f6f4bc:/pdftk-2.02-dist/pdftk# make clean -f Makefile.Debian > /dev/null
root@791899f6f4bc:/pdftk-2.02-dist/pdftk# make -f Makefile.Debian
<snip>
make -f Makefile -iC /pdftk-2.02-dist/pdftk/../java all
make[1]: Entering directory `/pdftk-2.02-dist/java'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/pdftk-2.02-dist/java'
g++-4.8 -Wall -Wextra -Weffc++ -O2 -static attachments.o report.o passwords.o pdftk.o /pdftk-2.02-dist/pdftk/../java/java_lib.o  -lgcj -o pdftk
/usr/bin/ld: cannot find -lgcj
collect2: error: ld returned 1 exit status
make: *** [pdftk] Error 1

Where is the libgcj.a static archive to link against? Turns out that that gcj doesn't ship with it since it can be buggy/error-prone. True on Ubuntu and Red Hat at least.

https://bugzilla.redhat.com/show_bug.cgi?id=1004507#c1

Statically linking gcj doesn't really work, which is why we are intentionally not shipping libgcj.a.
If you want to compile/link programs that don't depend on particular libgcj.so version, use -findirect-dispatch (both for compilation and linking).
If you don't want the executable to depend on libgcj, you can prepend -static-libgcj to the gcj command-line, but that won't work with the stock gcj package on Ubuntu Lucid, because libgcj.a was not included in the package. However, if you compile your own GCC (and enable Java), that will support -static-libgcj .

So we don't have an archive to link against.

Some german guy apparently got this working with an older toolchain and a compiler that includes libgcj.a http://dokupuppylinux.info/programs:pdf_manipulation#pdftk_141_statically_linked

We may be able to compile libgcj into a static archive ourselves. To be continued??

@egoldblum
Copy link
Author

root@f67a77791344:/mnt# ls -lah libgc* pdftk
lrwxr-xr-x 1 1000 staff   16 Jul 24 17:19 libgcj.so.14 -> libgcj.so.14.0.0
-rw-r--r-- 1 1000 staff  47M Jul 24 17:10 libgcj.so.14.0.0
-rwxr-xr-x 1 1000 staff 3.9M Jul 24 17:17 pdftk
root@f67a77791344:/mnt# ldd pdftk
    linux-vdso.so.1 =>  (0x00007ffdcdde4000)
    libgcj.so.14 => not found
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0593b56000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0593940000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f059357b000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0593275000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f0593e5a000)
root@f67a77791344:/mnt# LD_LIBRARY_PATH=/mnt ./pdftk --version

pdftk 2.02 a Handy Tool for Manipulating PDF Documents
Copyright (c) 2003-13 Steward and Lee, LLC - Please Visit: www.pdftk.com
This is free software; see the source code for copying conditions. There is
NO warranty, not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment