Skip to content

Instantly share code, notes, and snippets.

@grownseed
Last active October 12, 2023 17:36
Show Gist options
  • Star 38 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save grownseed/4fd2e91eca829cc039de to your computer and use it in GitHub Desktop.
Save grownseed/4fd2e91eca829cc039de to your computer and use it in GitHub Desktop.

Give me back my sanity

One of the many things I do for my group at work is to take care of automating as many things as possible. It usually brings me a lot of satisfaction, mostly because I get a kick out of making people's lives easier.

But sometimes, maybe too often, I end up in drawn-out struggles with machines and programs. And sometimes, these struggles bring me to the edge of despair, so much so that I regularly consider living on a computer-less island growing vegetables for a living.

This is the story of how I had to install Pandoc in a CentOS 6 Docker container. But more generally, this is the story of how I think computing is inherently broken, how programmers (myself included) tend to think that their way is the way, how we're ultimately replicating what most of us think is wrong with society, building upon layers and layers of (best-case scenario) obscure and/or weak foundations.

I would like to extend my gratitude to Google, StackOverflow, GitHub issues but mostly, the people who make the beer I drink


It all starts with this beautifully simple command:

yum install -y pandoc

After resolving about 80 dependencies and downloading around 150Mb of data, here we have it, a beautiful new Pandoc.

Except it isn't so new, because CentOS doesn't really do new, so instead we have Pandoc 1.9.4.1. You'd be tempted to think it doesn't matter so much, and that a few missing features aren't the end of the world. Unfortunately though this particular version is shipped with its very own show-stopping bugs.


"Do not fret" I hear, let's build a newer version of Pandoc ourselves!

Pandoc was written in Haskell, a quick Google search (like the 100's of other quick searches that led to this point) tells me that the compiler for Haskell is named GHC, why not...

My first reaction is to go:

yum install -y ghc

Quite a bit of time later (GHC is provided with 4 different flavours of every single library), we now have GHC 7.0.4-46.

Fast-forward a couple of hours and realize that this particular version of GHC is outdated, too...

It's ok, let's build it ourselves then:

RUN wget http://www.haskell.org/ghc/dist/7.8.2/ghc-7.8.2-x86_64-unknown-linux-centos65.tar.bz2
RUN tar xf ghc-7.8.2-x86_64-unknown-linux-centos65.tar.bz2
RUN cd ghc-7.8.2 && ./configure
RUN cd ghc-7.8.2 && make install

GHC doesn't need make, just make install; they're pretty proud of that - I couldn't care less.

And that successfully installed the new version of GHC. Getting there, or are we?


Well as it turns out, Pandoc is installed through Cabal, some sort of package-manager-which-isn't-really-one for Haskell.

I tentatively try yum, which of course ends in failure, and proceed to build Cabal:

yum install -y which gmp-devel

Because my base CentOS Docker image doesn't have which, and because Cabal's compilation throws an error about a missing -lgmp, stupid me, it's obviously the gmp-devel package.

wget http://www.haskell.org/cabal/release/cabal-install-1.20.0.3/cabal-install-1.20.0.3.tar.gz
tar xf cabal-install-1.20.0.3.tar.gz
cd cabal-install-1.20.0.3 && ./bootstrap.sh

And it builds, and downloads, and builds some more, and downloads some more...

ln -s /.cabal/bin/cabal /usr/bin/cabal

Because installing Cabal doesn't add Cabal to your path. This is nasty, I don't care, let's move on.

cabal update

Because it couldn't have done so previously, and so it builds and downloads some more.


By this point, I could have probably hand-written the converted files on a daily basis and still wasted less time. But we're almost there...

So I install Pandoc (and Citeproc), finally:

cabal install pandoc pandoc-citeproc

Hmm, there's an error with a missing UTF-8 locale, let's generate it with this completely obvious command:

localedef -v -c -i en_US -f UTF-8 en_US.UTF-8

Funny thing is, this command actually throws an error, which Docker doesn't like when building, so hey, why not:

RUN localedef -v -c -i en_US -f UTF-8 en_US.UTF-8 || true

And here we go, error ignored whatever happens.

But wait, building Pandoc still doesn't go through, as it turns out it's missing an environment variable which we could tell from the previous step was hard-coded somewhere else. Of course it doesn't explicitly tell you, the depths of the Internet are your friend:

ENV LANG en_US.UTF-8

I re-build Pandoc and go read the entirety of Les Miserables in the mean time...

Another nasty symlink to the executables, because Cabal doesn't do that either:

ln -s /.cabal/bin/pandoc /usr/bin/pandoc
ln -s /.cabal/bin/pandoc-citeproc /usr/bin/pandoc-citeproc

And here we go, we now have a working Pandoc!


You may not believe it, but this is a really concise version of what I actually had to go through.

I don't doubt that I'm not the sharpest tool in the box or that there aren't more shortcuts I could have taken, but in my view this process is completely crazy.

In my opinion, we've done this to ourselves, maybe because of developer ego, maybe because we're not as smart as we think we are, or maybe because we and the rest of the world expect too much, too quickly. I don't have an answer, nor do I have a solution to the whole problem, but I'll definitely think twice about telling anybody else to get into this business.

@slackmoehrle
Copy link

I just spent a day last week getting pandoc working on CentOS 6.5. I also needed Tex which I needed to follow: https://www.tug.org/texlive/acquire-netinstall.html"

@paulmaunders
Copy link

Thank you for writing this up - at the least the next person who Google's how to install pandoc on Centos 6, can refer to this!

@tracker1
Copy link

This is awesome... Even installing on a Windows server, making it available to console apps/services was pretty painful, but nothing like this... I'm glad I was able to remove pandoc from some of my requirements.

@lucasad
Copy link

lucasad commented Nov 17, 2014

Any reason you didn't use the pandoc rpmspec?

@sarlalian
Copy link

CentOS 6 was released on July 10, 2011 ... which is forever in internet years. Add in that CentOS isn't bleeding edge or really even leading edge when its released and that makes for some very old software.

@hadley
Copy link

hadley commented Nov 17, 2014

FWIW we (rstudio) provide pandoc binaries at https://s3.amazonaws.com/rstudio-buildtools/pandoc-1.13.1.zip (windows, linux and mac)

@berdario
Copy link

I'm not sure if it'd have truly helped... but for when you start consider to compile-the-compiler: Maybe it would've been easier to just build the software on another (modern) machine, and then just deploy the binary in the container.

It's apparently pretty easy to statically link the needed libraries

@bpartridge
Copy link

It's just as bad trying to get a new version of Pandoc installed on OS X - I gave up after half a day wasted. Why iPython Notebooks would depend on something with such arcane dependencies is beyond me.

@DanielBaird
Copy link

I had a very similar experience getting pandoc onto a centOS box. Even now that I have it working, the hours that VM takes to rebuild when I vagrant destroy && vagrant up is a reminder of the pandoc-install-figuring-out time that I will never get back.
I quite like pandoc but I think next time I'll try gimli https://github.com/walle/gimli

@boourns
Copy link

boourns commented Nov 18, 2014

This reminds me of linux approximately ~10 years ago, around the dawn of the 2.6 kernels.

This was how every non-trivial application install went down. I would get my system in a happy state only to want to try some video editing app, or music app, or game, and go down this rabbit hole. Except usually ~4 layers in you would hit a kernel or other fundamental incompatibility, give up, and hope that you didn't fundamentally break the system you started with.

I remember consistently trying to have both hardware-rendering & dual monitors in X, and usually ending up breaking both. This might still be just as broken, I don't know. I got to the point of editing xorg.conf when converting a JAMMA cabinet to a MAME box, and once while working on a raspberry pi project. In both cases http://xkcd.com/963/ floated into my head and I changed my approach to computing. (as an aside, the raspberry pi works great purely in console mode with SDL applications, which is the approach with the excellent retropie project - in X it is nearly unusable).

That this experience is exceptional enough to warrant a post suggests we're at least slowly making some progress 😄.

That being said, these problems coupled with the naive determination and free time of university created an excellent learning environment.

@mkramlich
Copy link

I ran into "Dependency Hell" last night trying to get Pandoc working on a Win8 box, to support the MD to PDF use cases.

Yes Pandoc exhibits one of the anti-patterns that a lot of Unixy software has inherited culturally. They're not alone. I like the discipline of having a software creator think of his tool conceptually as a single atomic "thing" that the user can just "get" and then it "just works", and without forcing unnecessary or confusing choices on newbie users. It is better for users if application software has only one external dependency, which is the "platform" it's built and blessed for, typically a particular combo of OS version and hardware. Anything else that software needs, like libraries or third-party tools, which are not already present and part of that particular OS version and hardware combo should be baked-in or otherwise included in the app package image -- even if that means redundant copies of libraries on the enduser's system, or risking having them get out of date, because that's a lesser evil in the normal case, especially for newbies or folks with simple needs.

Again, not to rag on Pandoc in particular. Great tool. Lots of great design choices about it apparently. Appreciate getting the benefit of it for free. But yes getting the current release working on, say, Win8 was ridiculously over-complicated and under/misdocumented and under-automated. All kinds of terse error messages and failure cases which were Not Supposed To Happen because It's Just So Simple... except it wasn't. The "Batteries Included" design pattern is your friend. Also, dumb static archive files (whose files are ready to run-in-place, immediately after unpacking) are better then self-extracting scary executables or theoretically "helpful" installer magic -- that way lies many more paths to madness.

my two cents. (but based on decades of experiencing the effects of various alternatives.)

@tgulacsi
Copy link

That's why I use Debian images under a (forced) CentOS host. And because I like Debian :)

@shofetim
Copy link

Ah, that sounds easy. Try getting the CLI version of the Android emulator to run in a docker container...

  • You have to accept the license agreement(s) interactively.
  • First you install it, then you ask it to download it's components so it can run. It downloads 13.4 GB before promptly deleting itself.
  • It filled up the host btrfs file system, which cause the host to crash, irrecoverably (twice).

^shrug^ I've heard there is a future in bio-intensive farming.

@DawidLoubser
Copy link

Well, this is why some people run Arch Linux :-) I run a constantly-recent pandoc with none of this pain.

@Grovespaz
Copy link

Why use CentOS?

@valpackett
Copy link

OS X:

$ brew install cabal-install
$ cabal update
$ cabal install pandoc

FreeBSD:

$ pkg install hs-cabal-install
$ cabal update
$ cabal install pandoc

@psftw
Copy link

psftw commented Nov 18, 2014

Great write-up, I feel your pain! I've had similar experiences with Python on RHEL6. By the way, the newly minted official Haskell docker image may have been useful to you (if being based on Debian is OK). I created a basic pandoc image (untested) with the following Dockerfile which will get you most of the way there:

FROM haskell:7.8
RUN cabal install pandoc pandoc-citeproc
ENV PATH /root/.cabal/bin:$PATH

https://registry.hub.docker.com/_/haskell/

@cstrahan
Copy link

Please, please, please check out the Nix package manager.

As both a committer and heavy user of Nix (and NixOS), I highly recommend checking out Nix. The things that you get with Nix that would have helped:

1. Up to date packages

We work really hard to keep packages up to date. It's very likely that you'd be happy with our Pandoc, and if not ...

2. Deterministic builds

No more "works on my machine" problems. Nix is purely functional, so you're guaranteed that a build that worked yesterday one your friends machine is going to work on yours today. So, if you weren't happy with our latest packaged version of Pandoc, you could easily bump a version number (automatically updating the url) and update the sha256 of the package, and rebuild [1]. But how do you find the necessary files? Well...

3. All package definitions are in one place

Just clone nixos/nixpkgs and modify pkgs/development/libraries/haskell/pandoc/default.nix. Done.

[1]: Actually, it's not even that difficult - we have a tool to automatically update a package definitions from Hackage.


Nix makes package management a breeze. Think about how much time you spent waiting for your computer to compile GHC, when all you really wanted to do was compile the latest Pandoc. With Nix, the binary for GHC would be fetched from the Nix package CI server (because you hadn't changed its definition at all), and then only Pandoc would rebuild (because that's all you modified). All other dependencies would be pulled down as necessary (cabal and such), so no hunting for a million things to install (and potentially uninstall later if you only wanted them during the build) - Nix does this for you. Because it doesn't dump everything in/usr/{lib,bin}, there's no mess left after a build.

There aren't many things in life where something truly deserves the "holy shit, why isn't everyone using this?" response, but Nix is one of those things.

@walle
Copy link

walle commented Nov 19, 2014

@DanielBaird thanks for the endorsement :)

I just created a Dockerfile for gimli, https://registry.hub.docker.com/u/walle/gimli/, that can run gimli in a container. It's my first Dockerfile though, so I'm bound to have missed something, enhancements are always appreciated.

@mietek
Copy link

mietek commented Dec 7, 2014

You could also check out Halcyon.

$ time halcyon install pandoc
-----> Deploying app from install
       Prefix:                                   /app
       Label:                                    pandoc-1.13.1
       Source hash:                              58dbb34
       External storage:                         private and public

-----> Restoring install
       Downloading s3://s3.halcyon.sh/linux-ubuntu-14.04-x86_64/halcyon-install-58dbb34-pandoc-1.13.1.tar.gz... done
       Extracting halcyon-install-58dbb34-pandoc-1.13.1.tar.gz... done, 85MB
-----> Install restored
-----> Installing app into /app... done

-----> App deployed:                             pandoc-1.13.1

real    0m13.565s
user    0m3.514s
sys     0m3.154s

@johnstantongeddes
Copy link

Thanks for this! Just moved to CentOS and had similar pain when doing this for Ubuntu so happy to get some tips to avoid future pain.

@johnstantongeddes
Copy link

@mietek tried your suggestion for Halcyon without luck. moving discussion to question on SO if you can answer!

@ZhengRui
Copy link

ZhengRui commented Mar 1, 2015

Thanks, save me a lot of time

@acanas
Copy link

acanas commented Apr 8, 2015

Thanks for telling your experience. Unfortunately not work for me, but this does: http://pkgs.org/centos-6/epel-x86_64/pandoc-1.9.4.1-1.1.el6.x86_64.rpm.html

@jstaf
Copy link

jstaf commented Mar 23, 2016

Works like a charm on CentOS 6.6... words cannot express how happy I am to get this working (literally bashed my head against my desk for about 4 hours today before I found this).

@ababushkin
Copy link

If you want to install cabal globally you can run ./bootstrap.sh --global

It will then be available in /usr/local/bin/cabal

@michaelnt
Copy link

Seems to have got easier now if you use stack as per the official pandoc page.

Here a Dockerfile for rhel6

FROM rhel6
ENV VERSION 1.17.0.3

# stack https://docs.haskellstack.org/en/stable/install_and_upgrade/#centos
RUN curl -sSL https://s3.amazonaws.com/download.fpcomplete.com/centos/6/fpco.repo | tee /etc/yum.repos.d/fpco.repo
RUN curl -s -o /tmp/pandoc-${VERSION}.tar.gz https://hackage.haskell.org/package/pandoc-1.17.0.3/pandoc-${VERSION}.tar.gz

RUN yum -y install stack gcc zlib-devel

RUN tar zxf /tmp/pandoc-${VERSION}.tar.gz -C /usr/local/src
RUN cd /usr/local/src/pandoc-${VERSION} && stack setup 
RUN cd /usr/local/src/pandoc-${VERSION} && stack install

@StubbsPKS
Copy link

My issue with using stack is that I STILL have a binary that isn't portable bc it wants to look in weird directories for a stack work directory. I don't want stack on all of our prod machines. I JUST want pandoc.

@caot
Copy link

caot commented Oct 26, 2017

pandoc through anaconda will be helpful. https://www.anaconda.com/

@pjrola
Copy link

pjrola commented Aug 30, 2018

I'm leaving this link here for anyone who is still struggling I managed to install it by following https://github.com/rstudio/rmarkdown/blob/master/PANDOC.md

wget -P /etc/yum.repos.d/ https://copr.fedoraproject.org/coprs/petersen/pandoc-el5/repo/epel-5/petersen-pandoc-el5-epel-5.repo
yum install -y pandoc pandoc-citeproc

will get you pandoc 1.15 if however you need the latest you should do

wget -P /etc/yum.repos.d/ https://copr.fedorainfracloud.org/coprs/petersen/pandoc/repo/epel-7/petersen-pandoc-epel-7.repo
yum install -y pandoc pandoc-citeproc

which will get you 2.2.1
hopefully this saves someone from days of work like I just experienced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment