So, it comes to point where you can deploy the cool #nlproc / #neuralempty tech you've built and there's this Docker thing that everyone is telling you to do so that installing the libraries/tools you need for you tech is less painful...
-
- Ubuntu 14.04 (unofficial)
- Ubuntu (official)
- Mac OSX (official)
https://www.docker.com/what-docker
Before continuing, take a look at these instructions:
- https://docs.docker.com/engine/installation/
- https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-compose-on-ubuntu-14-04
- https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-16-04
If any of the above works for you, you can skip to here ;P
Remove older versions of dockers
sudo apt-get remove docker docker-engine
Update all your distro
sudo apt-get update
sudo apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual
Wget and call the script
wget -qO- https://get.docker.com/ | sudo sh
Add your user to the docker
group
sudo usermod -aG docker $(whoami)
Simulate a logout + login to "activate" the group membership
su - $USER
(Optional): Install Docker Compose
sudo apt-get -y install python-pip
sudo pip install docker-compose
Like all new programming language, deep learning framework or anything that you have to write code for, here's the Docker's version of Hello World
:
docker run hello-world
BTW, I think we need some sort of Hello World
for NLP.
You should see something like this:
$ docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
78445dd45222: Pull complete
Digest: sha256:c5515758d4c5e1e838e9cd307f6c6a0d620b5e07e6f927b07d05f6d12a1ac8d7
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://cloud.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/engine/userguide/
Now, to sell you the real poison....
Moses is great for Machine Translation (MT) and surely more deployable than any #NeuralEmpty tools. Although installation has gone a long way from offensively obtuse to simply bjam
, there's still no reason to re-type the installation commands every time you deploy Moses on a new system/machine.
Ulrich Germann showed how it can be done easily on http://lectures.ms.mff.cuni.cz/view.php?rec=291.
Slides on MT Marathon 2015 Wiki, user and password is both mtm
.
TL;DR, just give me the docker file already: https://gist.github.com/alvations/f2727a6331e4a48c5a1905e47ef5c5f3
And the image on Docker hub: https://hub.docker.com/r/alvations/momo/
$ docker pull alvations/momo
$ docker run -it alvations/momo bash
Below we will walkthrough how the Docker container was created.
I'll choose the latest Ubuntu distro, but feel free to choose a distro of your choice.
To know the name(s) of the available docker images, you cause the docker search -
command, e.g.
$ docker search ubuntu # Look for "ubuntu"
$ docker search centos # Look for "centos"
$ docker search windows # Look for "windows"
If you see a warning like:
Warning: failed to get default registry endpoint from daemon (Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?). Using system default: https://index.docker.io/v1/
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
According to http://stackoverflow.com/a/33596140/610569, do this:
sudo usermod -aG docker $(whoami)
then logout and login. You can use the same su - $USER
trick.
On Mac OSX
(Just in case you're on a mac, though this is a guide for Linux)
$ docker-machine start # Start virtual machine for docker
$ docker-machine env # It's helps to get environment variables
$ eval "$(docker-machine env default)" # Set environment variables
Start an empty Ubuntu image:
docker run -it ubuntu bash
You will be "teleported" to a bash within an Ubuntu image, first we update + upgrade the distro.
apt-get update
apt-get install -y apt-utils debconf-utils
echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
apt-get update
apt-get -y upgrade
Possibly, you might see something like this at the end of the upgrade:
Setting up makedev (2.3.1-93ubuntu2~ubuntu16.04.1) ...
mv: cannot move 'console-' to 'console': Device or resource busy
makedev console c 5 1 root tty 0600: failed
You can just ignore it... More details, see http://stackoverflow.com/questions/43269412/device-or-resource-busy-docker
For the sake of future sanity, let's install sudo
, nano
, perl
and python
too.
apt-get install -y sudo nano
apt-get install -y perl
apt-get install -y python-dev python3-dev python-pip python3-pip
We'll also install some common unix tools too curl
, wget
, tar
and dtrx
:
apt-get install -y curl wget tar dtrx
Now, let's install Moses' dependencies:
apt-get install -y libboost-all-dev
apt-get install -y build-essential git-core pkg-config automake libtool wget zlib1g-dev python-dev libbz2-dev
apt-get install -y cmake
Still in the docker container after the apt-get -y upgrade
, we create a sudo user that'll we'll be using for the future. Let's use the username ubiwan
with password mosesdocker
useradd -m -p mosesdocker -s /bin/bash ubiwan
usermod -aG sudo ubiwan # add user to sudo list
su - ubiwan # login to the ubiwan user
From hence forth, we'll use the ubiwan
username to install Moses.
Still in the docker container, after logging in to ubiwan
, we continue with the Moses installation:
cd $HOME
git clone https://github.com/moses-smt/mosesdecoder.git
cd mosesdecoder
make -f contrib/Makefiles/install-dependencies.gmake
./compile.sh --max-kenlm-order=20 --max-factors=1000
cd $HOME
Now let's install MGIZA++ (the word aligner):
cd $HOME
git clone https://github.com/moses-smt/mgiza.git
cd mgiza/mgizapp
cmake .
make
make install
cp scripts/merge_alignment.py bin/
cd $HOME
We know that mkcls
is EXTREMELY slow, so let's replace it with @jonsafari's clustercat
:
cd $HOME
git clone https://github.com/jonsafari/clustercat.git
cd clustercat
make -j 4
cd $HOME
Finally, we create a directory to keep all the external binaries that Moses will need:
cd $HOME
mkdir moses-training-tools
cp mgiza/mgizapp/bin/* moses-training-tools/
cp clustercat/bin/clustercat moses-training-tools/
cp clustercat/bin/mkcls moses-training-tools/mkcls-clustercat
mv moses-training-tools/mkcls moses-training-tools/mkcls-original
cp moses-training-tools/mkcls-clustercat moses-training-tools/mkcls
cd $HOME
The moses-training-tools
should contain these files:
$ ls moses-training-tools/
clustercat d4norm hmmnorm mgiza mkcls mkcls-clustercat
mkcls-original plain2snt snt2cooc snt2coocrmp snt2plain symal
(Optional): Delete the source and minimize the Docker container size:
rm -rf mgiza/
rm -rf clustercat/
strip mosesdecoder/bin/* mosesdecoder/lib/* moses-training-tools/*
Let's exit from everywhere and out of the docker image:
exit # out of the ubiwan user
exit # out of the docker image
Now, we're ready to "save" our image.
First let's create a Docker Hub account. Go to Docker Hub and sign up:
Follow the instructions on https://docs.docker.com/engine/getstarted/step_five/ and create a repo named my-momo
.
You should now find the my-momo
repository on the url with your username, something like: https://hub.docker.com/r/<username>/my-momo/
, where <username>
is your username (without the angular brackets).
To push the image we've created into the my-momo
repo on your Docker Hub account, we have to find the Image ID, we must first commit and push the image into a the container. And we must give it a name, let's call it momo
.
# Commit image to the container
docker commit $(docker ps -q -l) momo
Then, we tag the momo
to the my-momo
repo:
# Tag *momo* image to *my-momo* repo in the Docker Hub
docker tag momo <username>/my-momo
Finally, we push the image into the my-momo
repo:
# Push to *my-momo*
docker push <username>/my-momo
Voila, now you have the Moses Docker image on your Docker Hub repo at https://hub.docker.com/r/<username>/my-momo/
!!!
With Docker intalled on the new machine and your my-momo
repo, you can simply do this to get Moses running:
docker pull <username>/my-momo
docker run -it <username>/my-momo bash
You train Moses models, decode and have fun!!!
Do read up on using Dockefile to install the tools/software you need in the Docker image too. It'll help a lot to automate the above steps. See Dockerfile tutorial
If you've made it up to this point, you deserve this. (Or perhaps, you've just pressed the TL;DR link on the content page that leads to here -_-||| )
Without following any of the steps you can simply use the pre-prepared Moses Docker image I've created:
docker pull alvations/momo
docker run -it alvations/momo bash
Or if you would like to use/modify the Dockerfile: https://gist.github.com/alvations/f2727a6331e4a48c5a1905e47ef5c5f3
And to build the image from the Dockerfile:
wget https://gist.githubusercontent.com/alvations/f2727a6331e4a48c5a1905e47ef5c5f3/raw/7d566e3ae03443ae9b20bbf2da1dadf9c649e958/momo.dock -O momo.dockerfile
docker build -t momo - < momo.dockerfile
If you encounter an issue of cannot connect to Docker daemon
:
username@server:~/momodocker$ docker pull alvations/momo
Using default tag: latest
Warning: failed to get default registry endpoint from daemon (Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?). Using system default: https://index.docker.io/v1/
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
See http://stackoverflow.com/questions/21871479/docker-cant-connect-to-docker-daemon
If you see:
username@server:~/momodocker$ eval` "$(docker-machine env default)"
this will sometimes fix. Other times, next error is:
"Error checking TLS connection: Host is not running"
See docker-archive/toolbox#453
TL;DR, Try this:
$ docker-machine rm default
About to remove default
WARNING: This action will delete both local reference and remote instance.
Are you sure? (y/n): y
Successfully removed default
username@server:~/momodocker$ docker-machine create default --driver virtualbox
Running pre-create checks...
(default) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(default) Latest release for github.com/boot2docker/boot2docker is v17.04.0-ce
(default) Downloading /Users/liling.tan/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v17.04.0-ce/boot2docker.iso...
(default) Creating VirtualBox VM...
(default) Creating SSH key...
(default) Starting the VM...
(default) Check network to re-create if needed...
(default) Waiting for an IP...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env default
When we created an image with docker run -it ubuntu bash
, it assigned a container ID to the image automatically.
To locate the container ID, we find the container for which we last exited:
docker ps -q -l
-q
: list only container IDs-l
: list only last created container
The command above will print something like e19f19d62ffd
to the terminal.
See also, http://stackoverflow.com/q/21928691/610569 and http://stackoverflow.com/q/19585028/610569