Skip to content

Instantly share code, notes, and snippets.

@grownseed
Last active October 12, 2023 17:36
Show Gist options
  • Save grownseed/4fd2e91eca829cc039de to your computer and use it in GitHub Desktop.
Save grownseed/4fd2e91eca829cc039de to your computer and use it in GitHub Desktop.

Give me back my sanity

One of the many things I do for my group at work is to take care of automating as many things as possible. It usually brings me a lot of satisfaction, mostly because I get a kick out of making people's lives easier.

But sometimes, maybe too often, I end up in drawn-out struggles with machines and programs. And sometimes, these struggles bring me to the edge of despair, so much so that I regularly consider living on a computer-less island growing vegetables for a living.

This is the story of how I had to install Pandoc in a CentOS 6 Docker container. But more generally, this is the story of how I think computing is inherently broken, how programmers (myself included) tend to think that their way is the way, how we're ultimately replicating what most of us think is wrong with society, building upon layers and layers of (best-case scenario) obscure and/or weak foundations.

I would like to extend my gratitude to Google, StackOverflow, GitHub issues but mostly, the people who make the beer I drink


It all starts with this beautifully simple command:

yum install -y pandoc

After resolving about 80 dependencies and downloading around 150Mb of data, here we have it, a beautiful new Pandoc.

Except it isn't so new, because CentOS doesn't really do new, so instead we have Pandoc 1.9.4.1. You'd be tempted to think it doesn't matter so much, and that a few missing features aren't the end of the world. Unfortunately though this particular version is shipped with its very own show-stopping bugs.


"Do not fret" I hear, let's build a newer version of Pandoc ourselves!

Pandoc was written in Haskell, a quick Google search (like the 100's of other quick searches that led to this point) tells me that the compiler for Haskell is named GHC, why not...

My first reaction is to go:

yum install -y ghc

Quite a bit of time later (GHC is provided with 4 different flavours of every single library), we now have GHC 7.0.4-46.

Fast-forward a couple of hours and realize that this particular version of GHC is outdated, too...

It's ok, let's build it ourselves then:

RUN wget http://www.haskell.org/ghc/dist/7.8.2/ghc-7.8.2-x86_64-unknown-linux-centos65.tar.bz2
RUN tar xf ghc-7.8.2-x86_64-unknown-linux-centos65.tar.bz2
RUN cd ghc-7.8.2 && ./configure
RUN cd ghc-7.8.2 && make install

GHC doesn't need make, just make install; they're pretty proud of that - I couldn't care less.

And that successfully installed the new version of GHC. Getting there, or are we?


Well as it turns out, Pandoc is installed through Cabal, some sort of package-manager-which-isn't-really-one for Haskell.

I tentatively try yum, which of course ends in failure, and proceed to build Cabal:

yum install -y which gmp-devel

Because my base CentOS Docker image doesn't have which, and because Cabal's compilation throws an error about a missing -lgmp, stupid me, it's obviously the gmp-devel package.

wget http://www.haskell.org/cabal/release/cabal-install-1.20.0.3/cabal-install-1.20.0.3.tar.gz
tar xf cabal-install-1.20.0.3.tar.gz
cd cabal-install-1.20.0.3 && ./bootstrap.sh

And it builds, and downloads, and builds some more, and downloads some more...

ln -s /.cabal/bin/cabal /usr/bin/cabal

Because installing Cabal doesn't add Cabal to your path. This is nasty, I don't care, let's move on.

cabal update

Because it couldn't have done so previously, and so it builds and downloads some more.


By this point, I could have probably hand-written the converted files on a daily basis and still wasted less time. But we're almost there...

So I install Pandoc (and Citeproc), finally:

cabal install pandoc pandoc-citeproc

Hmm, there's an error with a missing UTF-8 locale, let's generate it with this completely obvious command:

localedef -v -c -i en_US -f UTF-8 en_US.UTF-8

Funny thing is, this command actually throws an error, which Docker doesn't like when building, so hey, why not:

RUN localedef -v -c -i en_US -f UTF-8 en_US.UTF-8 || true

And here we go, error ignored whatever happens.

But wait, building Pandoc still doesn't go through, as it turns out it's missing an environment variable which we could tell from the previous step was hard-coded somewhere else. Of course it doesn't explicitly tell you, the depths of the Internet are your friend:

ENV LANG en_US.UTF-8

I re-build Pandoc and go read the entirety of Les Miserables in the mean time...

Another nasty symlink to the executables, because Cabal doesn't do that either:

ln -s /.cabal/bin/pandoc /usr/bin/pandoc
ln -s /.cabal/bin/pandoc-citeproc /usr/bin/pandoc-citeproc

And here we go, we now have a working Pandoc!


You may not believe it, but this is a really concise version of what I actually had to go through.

I don't doubt that I'm not the sharpest tool in the box or that there aren't more shortcuts I could have taken, but in my view this process is completely crazy.

In my opinion, we've done this to ourselves, maybe because of developer ego, maybe because we're not as smart as we think we are, or maybe because we and the rest of the world expect too much, too quickly. I don't have an answer, nor do I have a solution to the whole problem, but I'll definitely think twice about telling anybody else to get into this business.

@tgulacsi
Copy link

That's why I use Debian images under a (forced) CentOS host. And because I like Debian :)

@shofetim
Copy link

Ah, that sounds easy. Try getting the CLI version of the Android emulator to run in a docker container...

  • You have to accept the license agreement(s) interactively.
  • First you install it, then you ask it to download it's components so it can run. It downloads 13.4 GB before promptly deleting itself.
  • It filled up the host btrfs file system, which cause the host to crash, irrecoverably (twice).

^shrug^ I've heard there is a future in bio-intensive farming.

@DawidLoubser
Copy link

Well, this is why some people run Arch Linux :-) I run a constantly-recent pandoc with none of this pain.

@Grovespaz
Copy link

Why use CentOS?

@valpackett
Copy link

OS X:

$ brew install cabal-install
$ cabal update
$ cabal install pandoc

FreeBSD:

$ pkg install hs-cabal-install
$ cabal update
$ cabal install pandoc

@psftw
Copy link

psftw commented Nov 18, 2014

Great write-up, I feel your pain! I've had similar experiences with Python on RHEL6. By the way, the newly minted official Haskell docker image may have been useful to you (if being based on Debian is OK). I created a basic pandoc image (untested) with the following Dockerfile which will get you most of the way there:

FROM haskell:7.8
RUN cabal install pandoc pandoc-citeproc
ENV PATH /root/.cabal/bin:$PATH

https://registry.hub.docker.com/_/haskell/

@cstrahan
Copy link

Please, please, please check out the Nix package manager.

As both a committer and heavy user of Nix (and NixOS), I highly recommend checking out Nix. The things that you get with Nix that would have helped:

1. Up to date packages

We work really hard to keep packages up to date. It's very likely that you'd be happy with our Pandoc, and if not ...

2. Deterministic builds

No more "works on my machine" problems. Nix is purely functional, so you're guaranteed that a build that worked yesterday one your friends machine is going to work on yours today. So, if you weren't happy with our latest packaged version of Pandoc, you could easily bump a version number (automatically updating the url) and update the sha256 of the package, and rebuild [1]. But how do you find the necessary files? Well...

3. All package definitions are in one place

Just clone nixos/nixpkgs and modify pkgs/development/libraries/haskell/pandoc/default.nix. Done.

[1]: Actually, it's not even that difficult - we have a tool to automatically update a package definitions from Hackage.


Nix makes package management a breeze. Think about how much time you spent waiting for your computer to compile GHC, when all you really wanted to do was compile the latest Pandoc. With Nix, the binary for GHC would be fetched from the Nix package CI server (because you hadn't changed its definition at all), and then only Pandoc would rebuild (because that's all you modified). All other dependencies would be pulled down as necessary (cabal and such), so no hunting for a million things to install (and potentially uninstall later if you only wanted them during the build) - Nix does this for you. Because it doesn't dump everything in/usr/{lib,bin}, there's no mess left after a build.

There aren't many things in life where something truly deserves the "holy shit, why isn't everyone using this?" response, but Nix is one of those things.

@walle
Copy link

walle commented Nov 19, 2014

@DanielBaird thanks for the endorsement :)

I just created a Dockerfile for gimli, https://registry.hub.docker.com/u/walle/gimli/, that can run gimli in a container. It's my first Dockerfile though, so I'm bound to have missed something, enhancements are always appreciated.

@mietek
Copy link

mietek commented Dec 7, 2014

You could also check out Halcyon.

$ time halcyon install pandoc
-----> Deploying app from install
       Prefix:                                   /app
       Label:                                    pandoc-1.13.1
       Source hash:                              58dbb34
       External storage:                         private and public

-----> Restoring install
       Downloading s3://s3.halcyon.sh/linux-ubuntu-14.04-x86_64/halcyon-install-58dbb34-pandoc-1.13.1.tar.gz... done
       Extracting halcyon-install-58dbb34-pandoc-1.13.1.tar.gz... done, 85MB
-----> Install restored
-----> Installing app into /app... done

-----> App deployed:                             pandoc-1.13.1

real    0m13.565s
user    0m3.514s
sys     0m3.154s

@johnstantongeddes
Copy link

Thanks for this! Just moved to CentOS and had similar pain when doing this for Ubuntu so happy to get some tips to avoid future pain.

@johnstantongeddes
Copy link

@mietek tried your suggestion for Halcyon without luck. moving discussion to question on SO if you can answer!

@ZhengRui
Copy link

ZhengRui commented Mar 1, 2015

Thanks, save me a lot of time

@acanas
Copy link

acanas commented Apr 8, 2015

Thanks for telling your experience. Unfortunately not work for me, but this does: http://pkgs.org/centos-6/epel-x86_64/pandoc-1.9.4.1-1.1.el6.x86_64.rpm.html

@jstaf
Copy link

jstaf commented Mar 23, 2016

Works like a charm on CentOS 6.6... words cannot express how happy I am to get this working (literally bashed my head against my desk for about 4 hours today before I found this).

@ababushkin
Copy link

If you want to install cabal globally you can run ./bootstrap.sh --global

It will then be available in /usr/local/bin/cabal

@michaelnt
Copy link

Seems to have got easier now if you use stack as per the official pandoc page.

Here a Dockerfile for rhel6

FROM rhel6
ENV VERSION 1.17.0.3

# stack https://docs.haskellstack.org/en/stable/install_and_upgrade/#centos
RUN curl -sSL https://s3.amazonaws.com/download.fpcomplete.com/centos/6/fpco.repo | tee /etc/yum.repos.d/fpco.repo
RUN curl -s -o /tmp/pandoc-${VERSION}.tar.gz https://hackage.haskell.org/package/pandoc-1.17.0.3/pandoc-${VERSION}.tar.gz

RUN yum -y install stack gcc zlib-devel

RUN tar zxf /tmp/pandoc-${VERSION}.tar.gz -C /usr/local/src
RUN cd /usr/local/src/pandoc-${VERSION} && stack setup 
RUN cd /usr/local/src/pandoc-${VERSION} && stack install

@StubbsPKS
Copy link

My issue with using stack is that I STILL have a binary that isn't portable bc it wants to look in weird directories for a stack work directory. I don't want stack on all of our prod machines. I JUST want pandoc.

@caot
Copy link

caot commented Oct 26, 2017

pandoc through anaconda will be helpful. https://www.anaconda.com/

@pjrola
Copy link

pjrola commented Aug 30, 2018

I'm leaving this link here for anyone who is still struggling I managed to install it by following https://github.com/rstudio/rmarkdown/blob/master/PANDOC.md

wget -P /etc/yum.repos.d/ https://copr.fedoraproject.org/coprs/petersen/pandoc-el5/repo/epel-5/petersen-pandoc-el5-epel-5.repo
yum install -y pandoc pandoc-citeproc

will get you pandoc 1.15 if however you need the latest you should do

wget -P /etc/yum.repos.d/ https://copr.fedorainfracloud.org/coprs/petersen/pandoc/repo/epel-7/petersen-pandoc-epel-7.repo
yum install -y pandoc pandoc-citeproc

which will get you 2.2.1
hopefully this saves someone from days of work like I just experienced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment