Instantly share code, notes, and snippets.

Embed
What would you like to do?

Give me back my sanity

One of the many things I do for my group at work is to take care of automating as many things as possible. It usually brings me a lot of satisfaction, mostly because I get a kick out of making people's lives easier.

But sometimes, maybe too often, I end up in drawn-out struggles with machines and programs. And sometimes, these struggles bring me to the edge of despair, so much so that I regularly consider living on a computer-less island growing vegetables for a living.

This is the story of how I had to install Pandoc in a CentOS 6 Docker container. But more generally, this is the story of how I think computing is inherently broken, how programmers (myself included) tend to think that their way is the way, how we're ultimately replicating what most of us think is wrong with society, building upon layers and layers of (best-case scenario) obscure and/or weak foundations.

I would like to extend my gratitude to Google, StackOverflow, GitHub issues but mostly, the people who make the beer I drink


It all starts with this beautifully simple command:

yum install -y pandoc

After resolving about 80 dependencies and downloading around 150Mb of data, here we have it, a beautiful new Pandoc.

Except it isn't so new, because CentOS doesn't really do new, so instead we have Pandoc 1.9.4.1. You'd be tempted to think it doesn't matter so much, and that a few missing features aren't the end of the world. Unfortunately though this particular version is shipped with its very own show-stopping bugs.


"Do not fret" I hear, let's build a newer version of Pandoc ourselves!

Pandoc was written in Haskell, a quick Google search (like the 100's of other quick searches that led to this point) tells me that the compiler for Haskell is named GHC, why not...

My first reaction is to go:

yum install -y ghc

Quite a bit of time later (GHC is provided with 4 different flavours of every single library), we now have GHC 7.0.4-46.

Fast-forward a couple of hours and realize that this particular version of GHC is outdated, too...

It's ok, let's build it ourselves then:

RUN wget http://www.haskell.org/ghc/dist/7.8.2/ghc-7.8.2-x86_64-unknown-linux-centos65.tar.bz2
RUN tar xf ghc-7.8.2-x86_64-unknown-linux-centos65.tar.bz2
RUN cd ghc-7.8.2 && ./configure
RUN cd ghc-7.8.2 && make install

GHC doesn't need make, just make install; they're pretty proud of that - I couldn't care less.

And that successfully installed the new version of GHC. Getting there, or are we?


Well as it turns out, Pandoc is installed through Cabal, some sort of package-manager-which-isn't-really-one for Haskell.

I tentatively try yum, which of course ends in failure, and proceed to build Cabal:

yum install -y which gmp-devel

Because my base CentOS Docker image doesn't have which, and because Cabal's compilation throws an error about a missing -lgmp, stupid me, it's obviously the gmp-devel package.

wget http://www.haskell.org/cabal/release/cabal-install-1.20.0.3/cabal-install-1.20.0.3.tar.gz
tar xf cabal-install-1.20.0.3.tar.gz
cd cabal-install-1.20.0.3 && ./bootstrap.sh

And it builds, and downloads, and builds some more, and downloads some more...

ln -s /.cabal/bin/cabal /usr/bin/cabal

Because installing Cabal doesn't add Cabal to your path. This is nasty, I don't care, let's move on.

cabal update

Because it couldn't have done so previously, and so it builds and downloads some more.


By this point, I could have probably hand-written the converted files on a daily basis and still wasted less time. But we're almost there...

So I install Pandoc (and Citeproc), finally:

cabal install pandoc pandoc-citeproc

Hmm, there's an error with a missing UTF-8 locale, let's generate it with this completely obvious command:

localedef -v -c -i en_US -f UTF-8 en_US.UTF-8

Funny thing is, this command actually throws an error, which Docker doesn't like when building, so hey, why not:

RUN localedef -v -c -i en_US -f UTF-8 en_US.UTF-8 || true

And here we go, error ignored whatever happens.

But wait, building Pandoc still doesn't go through, as it turns out it's missing an environment variable which we could tell from the previous step was hard-coded somewhere else. Of course it doesn't explicitly tell you, the depths of the Internet are your friend:

ENV LANG en_US.UTF-8

I re-build Pandoc and go read the entirety of Les Miserables in the mean time...

Another nasty symlink to the executables, because Cabal doesn't do that either:

ln -s /.cabal/bin/pandoc /usr/bin/pandoc
ln -s /.cabal/bin/pandoc-citeproc /usr/bin/pandoc-citeproc

And here we go, we now have a working Pandoc!


You may not believe it, but this is a really concise version of what I actually had to go through.

I don't doubt that I'm not the sharpest tool in the box or that there aren't more shortcuts I could have taken, but in my view this process is completely crazy.

In my opinion, we've done this to ourselves, maybe because of developer ego, maybe because we're not as smart as we think we are, or maybe because we and the rest of the world expect too much, too quickly. I don't have an answer, nor do I have a solution to the whole problem, but I'll definitely think twice about telling anybody else to get into this business.

@hughdbrown

This comment has been minimized.

hughdbrown commented Nov 17, 2014

Wow, this sounds like what I did by hand to install pandoc, except I did not put the result into a docker container. And it might have been slightly easier since I was installing to ubuntu, not centos (i.e. the dependencies are more up to date).

@slackmoehrle

This comment has been minimized.

slackmoehrle commented Nov 17, 2014

I just spent a day last week getting pandoc working on CentOS 6.5. I also needed Tex which I needed to follow: https://www.tug.org/texlive/acquire-netinstall.html"

@paulmaunders

This comment has been minimized.

paulmaunders commented Nov 17, 2014

Thank you for writing this up - at the least the next person who Google's how to install pandoc on Centos 6, can refer to this!

@tracker1

This comment has been minimized.

tracker1 commented Nov 17, 2014

This is awesome... Even installing on a Windows server, making it available to console apps/services was pretty painful, but nothing like this... I'm glad I was able to remove pandoc from some of my requirements.

@lucasad

This comment has been minimized.

lucasad commented Nov 17, 2014

Any reason you didn't use the pandoc rpmspec?

@sarlalian

This comment has been minimized.

sarlalian commented Nov 17, 2014

CentOS 6 was released on July 10, 2011 ... which is forever in internet years. Add in that CentOS isn't bleeding edge or really even leading edge when its released and that makes for some very old software.

@hadley

This comment has been minimized.

hadley commented Nov 17, 2014

FWIW we (rstudio) provide pandoc binaries at https://s3.amazonaws.com/rstudio-buildtools/pandoc-1.13.1.zip (windows, linux and mac)

@berdario

This comment has been minimized.

berdario commented Nov 17, 2014

I'm not sure if it'd have truly helped... but for when you start consider to compile-the-compiler: Maybe it would've been easier to just build the software on another (modern) machine, and then just deploy the binary in the container.

It's apparently pretty easy to statically link the needed libraries

@bpartridge

This comment has been minimized.

bpartridge commented Nov 18, 2014

It's just as bad trying to get a new version of Pandoc installed on OS X - I gave up after half a day wasted. Why iPython Notebooks would depend on something with such arcane dependencies is beyond me.

@DanielBaird

This comment has been minimized.

DanielBaird commented Nov 18, 2014

I had a very similar experience getting pandoc onto a centOS box. Even now that I have it working, the hours that VM takes to rebuild when I vagrant destroy && vagrant up is a reminder of the pandoc-install-figuring-out time that I will never get back.
I quite like pandoc but I think next time I'll try gimli https://github.com/walle/gimli

@boourns

This comment has been minimized.

boourns commented Nov 18, 2014

This reminds me of linux approximately ~10 years ago, around the dawn of the 2.6 kernels.

This was how every non-trivial application install went down. I would get my system in a happy state only to want to try some video editing app, or music app, or game, and go down this rabbit hole. Except usually ~4 layers in you would hit a kernel or other fundamental incompatibility, give up, and hope that you didn't fundamentally break the system you started with.

I remember consistently trying to have both hardware-rendering & dual monitors in X, and usually ending up breaking both. This might still be just as broken, I don't know. I got to the point of editing xorg.conf when converting a JAMMA cabinet to a MAME box, and once while working on a raspberry pi project. In both cases http://xkcd.com/963/ floated into my head and I changed my approach to computing. (as an aside, the raspberry pi works great purely in console mode with SDL applications, which is the approach with the excellent retropie project - in X it is nearly unusable).

That this experience is exceptional enough to warrant a post suggests we're at least slowly making some progress 😄.

That being said, these problems coupled with the naive determination and free time of university created an excellent learning environment.

@mkramlich

This comment has been minimized.

mkramlich commented Nov 18, 2014

I ran into "Dependency Hell" last night trying to get Pandoc working on a Win8 box, to support the MD to PDF use cases.

Yes Pandoc exhibits one of the anti-patterns that a lot of Unixy software has inherited culturally. They're not alone. I like the discipline of having a software creator think of his tool conceptually as a single atomic "thing" that the user can just "get" and then it "just works", and without forcing unnecessary or confusing choices on newbie users. It is better for users if application software has only one external dependency, which is the "platform" it's built and blessed for, typically a particular combo of OS version and hardware. Anything else that software needs, like libraries or third-party tools, which are not already present and part of that particular OS version and hardware combo should be baked-in or otherwise included in the app package image -- even if that means redundant copies of libraries on the enduser's system, or risking having them get out of date, because that's a lesser evil in the normal case, especially for newbies or folks with simple needs.

Again, not to rag on Pandoc in particular. Great tool. Lots of great design choices about it apparently. Appreciate getting the benefit of it for free. But yes getting the current release working on, say, Win8 was ridiculously over-complicated and under/misdocumented and under-automated. All kinds of terse error messages and failure cases which were Not Supposed To Happen because It's Just So Simple... except it wasn't. The "Batteries Included" design pattern is your friend. Also, dumb static archive files (whose files are ready to run-in-place, immediately after unpacking) are better then self-extracting scary executables or theoretically "helpful" installer magic -- that way lies many more paths to madness.

my two cents. (but based on decades of experiencing the effects of various alternatives.)

@tgulacsi

This comment has been minimized.

tgulacsi commented Nov 18, 2014

That's why I use Debian images under a (forced) CentOS host. And because I like Debian :)

@shofetim

This comment has been minimized.

shofetim commented Nov 18, 2014

Ah, that sounds easy. Try getting the CLI version of the Android emulator to run in a docker container...

  • You have to accept the license agreement(s) interactively.
  • First you install it, then you ask it to download it's components so it can run. It downloads 13.4 GB before promptly deleting itself.
  • It filled up the host btrfs file system, which cause the host to crash, irrecoverably (twice).

^shrug^ I've heard there is a future in bio-intensive farming.

@DawidLoubser

This comment has been minimized.

DawidLoubser commented Nov 18, 2014

Well, this is why some people run Arch Linux :-) I run a constantly-recent pandoc with none of this pain.

@Grovespaz

This comment has been minimized.

Grovespaz commented Nov 18, 2014

Why use CentOS?

@myfreeweb

This comment has been minimized.

myfreeweb commented Nov 18, 2014

OS X:

$ brew install cabal-install
$ cabal update
$ cabal install pandoc

FreeBSD:

$ pkg install hs-cabal-install
$ cabal update
$ cabal install pandoc
@psftw

This comment has been minimized.

psftw commented Nov 18, 2014

Great write-up, I feel your pain! I've had similar experiences with Python on RHEL6. By the way, the newly minted official Haskell docker image may have been useful to you (if being based on Debian is OK). I created a basic pandoc image (untested) with the following Dockerfile which will get you most of the way there:

FROM haskell:7.8
RUN cabal install pandoc pandoc-citeproc
ENV PATH /root/.cabal/bin:$PATH

https://registry.hub.docker.com/_/haskell/

@cstrahan

This comment has been minimized.

cstrahan commented Nov 18, 2014

Please, please, please check out the Nix package manager.

As both a committer and heavy user of Nix (and NixOS), I highly recommend checking out Nix. The things that you get with Nix that would have helped:

1. Up to date packages

We work really hard to keep packages up to date. It's very likely that you'd be happy with our Pandoc, and if not ...

2. Deterministic builds

No more "works on my machine" problems. Nix is purely functional, so you're guaranteed that a build that worked yesterday one your friends machine is going to work on yours today. So, if you weren't happy with our latest packaged version of Pandoc, you could easily bump a version number (automatically updating the url) and update the sha256 of the package, and rebuild [1]. But how do you find the necessary files? Well...

3. All package definitions are in one place

Just clone nixos/nixpkgs and modify pkgs/development/libraries/haskell/pandoc/default.nix. Done.

[1]: Actually, it's not even that difficult - we have a tool to automatically update a package definitions from Hackage.


Nix makes package management a breeze. Think about how much time you spent waiting for your computer to compile GHC, when all you really wanted to do was compile the latest Pandoc. With Nix, the binary for GHC would be fetched from the Nix package CI server (because you hadn't changed its definition at all), and then only Pandoc would rebuild (because that's all you modified). All other dependencies would be pulled down as necessary (cabal and such), so no hunting for a million things to install (and potentially uninstall later if you only wanted them during the build) - Nix does this for you. Because it doesn't dump everything in/usr/{lib,bin}, there's no mess left after a build.

There aren't many things in life where something truly deserves the "holy shit, why isn't everyone using this?" response, but Nix is one of those things.

@walle

This comment has been minimized.

walle commented Nov 19, 2014

@DanielBaird thanks for the endorsement :)

I just created a Dockerfile for gimli, https://registry.hub.docker.com/u/walle/gimli/, that can run gimli in a container. It's my first Dockerfile though, so I'm bound to have missed something, enhancements are always appreciated.

@mietek

This comment has been minimized.

mietek commented Dec 7, 2014

You could also check out Halcyon.

$ time halcyon install pandoc
-----> Deploying app from install
       Prefix:                                   /app
       Label:                                    pandoc-1.13.1
       Source hash:                              58dbb34
       External storage:                         private and public

-----> Restoring install
       Downloading s3://s3.halcyon.sh/linux-ubuntu-14.04-x86_64/halcyon-install-58dbb34-pandoc-1.13.1.tar.gz... done
       Extracting halcyon-install-58dbb34-pandoc-1.13.1.tar.gz... done, 85MB
-----> Install restored
-----> Installing app into /app... done

-----> App deployed:                             pandoc-1.13.1

real    0m13.565s
user    0m3.514s
sys     0m3.154s
@johnstantongeddes

This comment has been minimized.

johnstantongeddes commented Feb 13, 2015

Thanks for this! Just moved to CentOS and had similar pain when doing this for Ubuntu so happy to get some tips to avoid future pain.

@johnstantongeddes

This comment has been minimized.

johnstantongeddes commented Feb 13, 2015

@mietek tried your suggestion for Halcyon without luck. moving discussion to question on SO if you can answer!

@ZhengRui

This comment has been minimized.

ZhengRui commented Mar 1, 2015

Thanks, save me a lot of time

@acanas

This comment has been minimized.

acanas commented Apr 8, 2015

Thanks for telling your experience. Unfortunately not work for me, but this does: http://pkgs.org/centos-6/epel-x86_64/pandoc-1.9.4.1-1.1.el6.x86_64.rpm.html

@jstaf

This comment has been minimized.

jstaf commented Mar 23, 2016

Works like a charm on CentOS 6.6... words cannot express how happy I am to get this working (literally bashed my head against my desk for about 4 hours today before I found this).

@ababushkin

This comment has been minimized.

ababushkin commented May 5, 2016

If you want to install cabal globally you can run ./bootstrap.sh --global

It will then be available in /usr/local/bin/cabal

@michaelnt

This comment has been minimized.

michaelnt commented Aug 30, 2016

Seems to have got easier now if you use stack as per the official pandoc page.

Here a Dockerfile for rhel6

FROM rhel6
ENV VERSION 1.17.0.3

# stack https://docs.haskellstack.org/en/stable/install_and_upgrade/#centos
RUN curl -sSL https://s3.amazonaws.com/download.fpcomplete.com/centos/6/fpco.repo | tee /etc/yum.repos.d/fpco.repo
RUN curl -s -o /tmp/pandoc-${VERSION}.tar.gz https://hackage.haskell.org/package/pandoc-1.17.0.3/pandoc-${VERSION}.tar.gz

RUN yum -y install stack gcc zlib-devel

RUN tar zxf /tmp/pandoc-${VERSION}.tar.gz -C /usr/local/src
RUN cd /usr/local/src/pandoc-${VERSION} && stack setup 
RUN cd /usr/local/src/pandoc-${VERSION} && stack install
@StubbsPKS

This comment has been minimized.

StubbsPKS commented Dec 27, 2016

My issue with using stack is that I STILL have a binary that isn't portable bc it wants to look in weird directories for a stack work directory. I don't want stack on all of our prod machines. I JUST want pandoc.

@caot

This comment has been minimized.

caot commented Oct 26, 2017

pandoc through anaconda will be helpful. https://www.anaconda.com/

@pjrola

This comment has been minimized.

pjrola commented Aug 30, 2018

I'm leaving this link here for anyone who is still struggling I managed to install it by following https://github.com/rstudio/rmarkdown/blob/master/PANDOC.md

wget -P /etc/yum.repos.d/ https://copr.fedoraproject.org/coprs/petersen/pandoc-el5/repo/epel-5/petersen-pandoc-el5-epel-5.repo
yum install -y pandoc pandoc-citeproc

will get you pandoc 1.15 if however you need the latest you should do

wget -P /etc/yum.repos.d/ https://copr.fedorainfracloud.org/coprs/petersen/pandoc/repo/epel-7/petersen-pandoc-epel-7.repo
yum install -y pandoc pandoc-citeproc

which will get you 2.2.1
hopefully this saves someone from days of work like I just experienced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment