One of the many things I do for my group at work is to take care of automating as many things as possible. It usually brings me a lot of satisfaction, mostly because I get a kick out of making people's lives easier.
But sometimes, maybe too often, I end up in drawn-out struggles with machines and programs. And sometimes, these struggles bring me to the edge of despair, so much so that I regularly consider living on a computer-less island growing vegetables for a living.
This is the story of how I had to install Pandoc in a CentOS 6 Docker container. But more generally, this is the story of how I think computing is inherently broken, how programmers (myself included) tend to think that their way is the way, how we're ultimately replicating what most of us think is wrong with society, building upon layers and layers of (best-case scenario) obscure and/or weak foundations.
I would like to extend my gratitude to Google, StackOverflow, GitHub issues but mostly, the people who make the beer I drink
It all starts with this beautifully simple command:
yum install -y pandoc
After resolving about 80 dependencies and downloading around 150Mb of data, here we have it, a beautiful new Pandoc.
Except it isn't so new, because CentOS doesn't really do new, so instead we have Pandoc 1.9.4.1. You'd be tempted to think it doesn't matter so much, and that a few missing features aren't the end of the world. Unfortunately though this particular version is shipped with its very own show-stopping bugs.
"Do not fret" I hear, let's build a newer version of Pandoc ourselves!
Pandoc was written in Haskell, a quick Google search (like the 100's of other quick searches that led to this point) tells me that the compiler for Haskell is named GHC, why not...
My first reaction is to go:
yum install -y ghc
Quite a bit of time later (GHC is provided with 4 different flavours of every single library), we now have GHC 7.0.4-46.
Fast-forward a couple of hours and realize that this particular version of GHC is outdated, too...
It's ok, let's build it ourselves then:
RUN wget http://www.haskell.org/ghc/dist/7.8.2/ghc-7.8.2-x86_64-unknown-linux-centos65.tar.bz2
RUN tar xf ghc-7.8.2-x86_64-unknown-linux-centos65.tar.bz2
RUN cd ghc-7.8.2 && ./configure
RUN cd ghc-7.8.2 && make install
GHC doesn't need make
, just make install
; they're pretty proud of that - I couldn't care less.
And that successfully installed the new version of GHC. Getting there, or are we?
Well as it turns out, Pandoc is installed through Cabal, some sort of package-manager-which-isn't-really-one for Haskell.
I tentatively try yum
, which of course ends in failure, and proceed to build Cabal:
yum install -y which gmp-devel
Because my base CentOS Docker image doesn't have which, and because Cabal's compilation throws an error about a missing -lgmp
, stupid me, it's obviously the gmp-devel
package.
wget http://www.haskell.org/cabal/release/cabal-install-1.20.0.3/cabal-install-1.20.0.3.tar.gz
tar xf cabal-install-1.20.0.3.tar.gz
cd cabal-install-1.20.0.3 && ./bootstrap.sh
And it builds, and downloads, and builds some more, and downloads some more...
ln -s /.cabal/bin/cabal /usr/bin/cabal
Because installing Cabal doesn't add Cabal to your path. This is nasty, I don't care, let's move on.
cabal update
Because it couldn't have done so previously, and so it builds and downloads some more.
By this point, I could have probably hand-written the converted files on a daily basis and still wasted less time. But we're almost there...
So I install Pandoc (and Citeproc), finally:
cabal install pandoc pandoc-citeproc
Hmm, there's an error with a missing UTF-8 locale, let's generate it with this completely obvious command:
localedef -v -c -i en_US -f UTF-8 en_US.UTF-8
Funny thing is, this command actually throws an error, which Docker doesn't like when building, so hey, why not:
RUN localedef -v -c -i en_US -f UTF-8 en_US.UTF-8 || true
And here we go, error ignored whatever happens.
But wait, building Pandoc still doesn't go through, as it turns out it's missing an environment variable which we could tell from the previous step was hard-coded somewhere else. Of course it doesn't explicitly tell you, the depths of the Internet are your friend:
ENV LANG en_US.UTF-8
I re-build Pandoc and go read the entirety of Les Miserables in the mean time...
Another nasty symlink to the executables, because Cabal doesn't do that either:
ln -s /.cabal/bin/pandoc /usr/bin/pandoc
ln -s /.cabal/bin/pandoc-citeproc /usr/bin/pandoc-citeproc
And here we go, we now have a working Pandoc!
You may not believe it, but this is a really concise version of what I actually had to go through.
I don't doubt that I'm not the sharpest tool in the box or that there aren't more shortcuts I could have taken, but in my view this process is completely crazy.
In my opinion, we've done this to ourselves, maybe because of developer ego, maybe because we're not as smart as we think we are, or maybe because we and the rest of the world expect too much, too quickly. I don't have an answer, nor do I have a solution to the whole problem, but I'll definitely think twice about telling anybody else to get into this business.
Seems to have got easier now if you use stack as per the official pandoc page.
Here a Dockerfile for rhel6