Skip to content

Instantly share code, notes, and snippets.

@alvations
Last active April 10, 2017 05:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alvations/c5af9d7acb7e756bbfb44659558d7bea to your computer and use it in GitHub Desktop.
Save alvations/c5af9d7acb7e756bbfb44659558d7bea to your computer and use it in GitHub Desktop.

So, it comes to point where you can deploy the cool #nlproc / #neuralempty tech you've built and there's this Docker thing that everyone is telling you to do so that installing the libraries/tools you need for you tech is less painful...

Contents

  1. What is Docker?

  2. "Dockerize..."

  3. Install Docker on

  4. Hello World

  5. The True Poison

  6. FAQ/Troubleshooting

What is Docker?

https://www.docker.com/what-docker

And now, "dockerize"... (Wave wand at the terminal)

Before continuing, take a look at these instructions:

If any of the above works for you, you can skip to here ;P

Install Docker on Ubuntu 14.04 (the TL;DR way)

Remove older versions of dockers

sudo apt-get remove docker docker-engine

Update all your distro

sudo apt-get update
sudo apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual

Wget and call the script

wget -qO- https://get.docker.com/ | sudo sh

Add your user to the docker group

sudo usermod -aG docker $(whoami)

Simulate a logout + login to "activate" the group membership

su - $USER

(Optional): Install Docker Compose

sudo apt-get -y install python-pip
sudo pip install docker-compose

Hello World

Like all new programming language, deep learning framework or anything that you have to write code for, here's the Docker's version of Hello World:

docker run hello-world

BTW, I think we need some sort of Hello World for NLP.

You should see something like this:

$ docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
78445dd45222: Pull complete 
Digest: sha256:c5515758d4c5e1e838e9cd307f6c6a0d620b5e07e6f927b07d05f6d12a1ac8d7
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://cloud.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/engine/userguide/

The True Poison

Now, to sell you the real poison....

Moses is great for Machine Translation (MT) and surely more deployable than any #NeuralEmpty tools. Although installation has gone a long way from offensively obtuse to simply bjam, there's still no reason to re-type the installation commands every time you deploy Moses on a new system/machine.

Ulrich Germann showed how it can be done easily on http://lectures.ms.mff.cuni.cz/view.php?rec=291.

Slides on MT Marathon 2015 Wiki, user and password is both mtm.

TL;DR, just give me the docker file already: https://gist.github.com/alvations/f2727a6331e4a48c5a1905e47ef5c5f3

And the image on Docker hub: https://hub.docker.com/r/alvations/momo/

$ docker pull alvations/momo
$ docker run -it alvations/momo bash 

Below we will walkthrough how the Docker container was created.

Let's Dockerize Moses

I'll choose the latest Ubuntu distro, but feel free to choose a distro of your choice.

To know the name(s) of the available docker images, you cause the docker search - command, e.g.

$ docker search ubuntu    # Look for "ubuntu" 
$ docker search centos    # Look for "centos" 
$ docker search windows   # Look for "windows" 

If you see a warning like:

Warning: failed to get default registry endpoint from daemon (Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?). Using system default: https://index.docker.io/v1/
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

According to http://stackoverflow.com/a/33596140/610569, do this:

sudo usermod -aG docker $(whoami)

then logout and login. You can use the same su - $USER trick.

On Mac OSX

(Just in case you're on a mac, though this is a guide for Linux)

$ docker-machine start # Start virtual machine for docker
$ docker-machine env  # It's helps to get environment variables
$ eval "$(docker-machine env default)" # Set environment variables

Start Ubuntu in Docker

Start an empty Ubuntu image:

docker run -it ubuntu bash

You will be "teleported" to a bash within an Ubuntu image, first we update + upgrade the distro.

apt-get update
apt-get install -y apt-utils debconf-utils
echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
apt-get update
apt-get -y upgrade

Possibly, you might see something like this at the end of the upgrade:

Setting up makedev (2.3.1-93ubuntu2~ubuntu16.04.1) ...
mv: cannot move 'console-' to 'console': Device or resource busy
makedev console c 5 1 root tty 0600: failed

You can just ignore it... More details, see http://stackoverflow.com/questions/43269412/device-or-resource-busy-docker

For the sake of future sanity, let's install sudo, nano, perl and python too.

apt-get install -y sudo nano
apt-get install -y perl 
apt-get install -y python-dev python3-dev python-pip python3-pip

We'll also install some common unix tools too curl, wget, tar and dtrx:

apt-get install -y curl wget tar dtrx

Now, let's install Moses' dependencies:

apt-get install -y libboost-all-dev
apt-get install -y build-essential git-core pkg-config automake libtool wget zlib1g-dev python-dev libbz2-dev
apt-get install -y cmake 

Setup a user account

Still in the docker container after the apt-get -y upgrade, we create a sudo user that'll we'll be using for the future. Let's use the username ubiwan with password mosesdocker

useradd -m -p mosesdocker -s /bin/bash ubiwan
usermod -aG sudo ubiwan  # add user to sudo list
su - ubiwan  # login to the ubiwan user

From hence forth, we'll use the ubiwan username to install Moses.

Installing Moses

Still in the docker container, after logging in to ubiwan, we continue with the Moses installation:

cd $HOME
git clone https://github.com/moses-smt/mosesdecoder.git 
cd mosesdecoder
make -f contrib/Makefiles/install-dependencies.gmake
./compile.sh  --max-kenlm-order=20 --max-factors=1000
cd $HOME

Now let's install MGIZA++ (the word aligner):

cd $HOME
git clone https://github.com/moses-smt/mgiza.git
cd mgiza/mgizapp
cmake .
make 
make install
cp scripts/merge_alignment.py bin/
cd $HOME

We know that mkcls is EXTREMELY slow, so let's replace it with @jonsafari's clustercat:

cd $HOME
git clone https://github.com/jonsafari/clustercat.git
cd clustercat
make -j 4
cd $HOME

Finally, we create a directory to keep all the external binaries that Moses will need:

cd $HOME
mkdir moses-training-tools
cp mgiza/mgizapp/bin/* moses-training-tools/
cp clustercat/bin/clustercat moses-training-tools/
cp clustercat/bin/mkcls moses-training-tools/mkcls-clustercat
mv moses-training-tools/mkcls moses-training-tools/mkcls-original
cp moses-training-tools/mkcls-clustercat moses-training-tools/mkcls
cd $HOME

The moses-training-tools should contain these files:

$ ls moses-training-tools/

clustercat  d4norm  hmmnorm  mgiza  mkcls  mkcls-clustercat  
mkcls-original  plain2snt  snt2cooc  snt2coocrmp  snt2plain  symal

(Optional): Delete the source and minimize the Docker container size:

rm -rf mgiza/
rm -rf clustercat/
strip mosesdecoder/bin/* mosesdecoder/lib/* moses-training-tools/*

Let's exit from everywhere and out of the docker image:

exit # out of the ubiwan user
exit # out of the docker image

Create a Docker Hub account and repo

Now, we're ready to "save" our image.

First let's create a Docker Hub account. Go to Docker Hub and sign up:

Docker Signup

Follow the instructions on https://docs.docker.com/engine/getstarted/step_five/ and create a repo named my-momo.

You should now find the my-momo repository on the url with your username, something like: https://hub.docker.com/r/<username>/my-momo/, where <username> is your username (without the angular brackets).

Find the image ID and push to Docker Hub

To push the image we've created into the my-momo repo on your Docker Hub account, we have to find the Image ID, we must first commit and push the image into a the container. And we must give it a name, let's call it momo.

# Commit image to the container
docker commit $(docker ps -q -l) momo

Then, we tag the momo to the my-momo repo:

# Tag *momo* image to *my-momo* repo in the Docker Hub
docker tag momo <username>/my-momo

Finally, we push the image into the my-momo repo:

# Push to *my-momo*
docker push <username>/my-momo

Voila, now you have the Moses Docker image on your Docker Hub repo at https://hub.docker.com/r/<username>/my-momo/!!!

Run the Docker container on another machine

With Docker intalled on the new machine and your my-momo repo, you can simply do this to get Moses running:

docker pull <username>/my-momo
docker run -it <username>/my-momo bash

You train Moses models, decode and have fun!!!

Do read up on using Dockefile to install the tools/software you need in the Docker image too. It'll help a lot to automate the above steps. See Dockerfile tutorial

TL;DR

If you've made it up to this point, you deserve this. (Or perhaps, you've just pressed the TL;DR link on the content page that leads to here -_-||| )

Without following any of the steps you can simply use the pre-prepared Moses Docker image I've created:

docker pull alvations/momo
docker run -it alvations/momo bash

Or if you would like to use/modify the Dockerfile: https://gist.github.com/alvations/f2727a6331e4a48c5a1905e47ef5c5f3

And to build the image from the Dockerfile:

wget https://gist.githubusercontent.com/alvations/f2727a6331e4a48c5a1905e47ef5c5f3/raw/7d566e3ae03443ae9b20bbf2da1dadf9c649e958/momo.dock -O momo.dockerfile
docker build -t momo - < momo.dockerfile

FAQ

Cannot connect to Docker daemon

If you encounter an issue of cannot connect to Docker daemon:

username@server:~/momodocker$ docker pull alvations/momo
Using default tag: latest
Warning: failed to get default registry endpoint from daemon (Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?). Using system default: https://index.docker.io/v1/
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

See http://stackoverflow.com/questions/21871479/docker-cant-connect-to-docker-daemon

Error checking TLS connection: Host is not running

If you see:

username@server:~/momodocker$ eval` "$(docker-machine env default)"
this will sometimes fix. Other times, next error is:
"Error checking TLS connection: Host is not running"

See docker-archive/toolbox#453

TL;DR, Try this:

$ docker-machine rm default
About to remove default
WARNING: This action will delete both local reference and remote instance.
Are you sure? (y/n): y
Successfully removed default
username@server:~/momodocker$ docker-machine create default --driver virtualbox
Running pre-create checks...
(default) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(default) Latest release for github.com/boot2docker/boot2docker is v17.04.0-ce
(default) Downloading /Users/liling.tan/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v17.04.0-ce/boot2docker.iso...
(default) Creating VirtualBox VM...
(default) Creating SSH key...
(default) Starting the VM...
(default) Check network to re-create if needed...
(default) Waiting for an IP...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env default

How to continue a docker which is exited?

When we created an image with docker run -it ubuntu bash, it assigned a container ID to the image automatically.

To locate the container ID, we find the container for which we last exited:

docker ps -q -l
  • -q: list only container IDs
  • -l: list only last created container

The command above will print something like e19f19d62ffd to the terminal.

See also, http://stackoverflow.com/q/21928691/610569 and http://stackoverflow.com/q/19585028/610569

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment